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NOVEL HUMAN POLYPEPTIDES ENCODED BY 
POLYNUCLEOXmES 

Priority Claim 

[001] This application is related to the following provisional applications 
filed in the United States Patent and Trademark Office, the disclosures of which are 
hereby incorporated by reference: 



Application 
Number 


Title 


Filing Date 


60/406,616 


Polynucleotides Encoding Secreted Proteins and 
Secreted Proteins Encoded Thereby 


August 29, 

2002 


60/406,655 


Polynucleotides Encoding Single Transmembrane 
Proteins And Single Transmembrane Proteins 
Encoded Thereby 


August 29, 
2002 


60/406,640 


I^qlynucleotides Encoding Multiple Transmembrane 
Proteins And Multiple Tiaxisiheinbrane Proteins 
Encoded Thereby 


August 29, 
2002 


60/406,576 


Polynucleotides Encoding Kinases and Kinases 
Encoded Thereby 


August 29, 
2002 


60/406,666 


Polynucleotides Encoding Proteases and Proteases 
Encoded Thereby 


August 29, 
2002 


60/406,611 


Polynucleotides Encoding Phosphatases and 
Phosphatases ^coded Thereby 


August 29, 
2002 


60/406,612 


Polynucleotides Sicoding Polypeptides and 
Polypeptides Encoded Thereby 


August 29, 
2002 


60/411,019 


Polynucleotides Encoding Secreted Proteins and 
Secreted Proteins Encoded Thereby 


September 17, 
2002 


60/411,024 


Novel Polynucleotides Encoding Secreted Proteins 
and Novel Secreted Proteins Encoded Thereby 


September 17, 
2002 


60/411,046 


Polynucleotides Encoding Single Transmembrane 
Proteins and Single Transmembrane Proteins Encoded 
Thereby 


September 17, 
2002 
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60/411,082 


Novel Polynucleotides Encoding Single 
Transmembrane Proteins and Novel Single 
Transmembrane Proteins Encoded Thereby 


September 17, 
2002 


60/411,022 


Polynucleotides Encoding Multiple Transmembrane 
Proteins and Multiple Transmembrane Proteios 
Encoded Thereby 


September 17, 
2002 


60/410,962 


Novel Polynucleotides Encoding Multiple 
Transmembrane Proteins and Novel Multiple 
Transmembrane Proteins Encoded Thereby 


September 17, 
2002 


60/410,953 


Polynucleotides Encoding Kinases And Kinases 
Encoded Thereby 


September 17, 
2002 


60/410,957 


Novel Polynucleotides Encoding Kinases and Novel 

Kinases Encoded Thereby 


September 17, 

2002 


60/411,037 


Polynucleotides Encoding Phosphatases and 
Phosphatases ^coded Thereby 


September 17, 
2002 


60/410,951 


Novel Polynucleotides Encoding Phosphatases and 
Novel Phosphatases Eacoded Thereby 


September 17, 
2002 


60/410,946 


Polynucleotides Encoding Proteases and Proteases 
. Encoded Thereby 


September 17, 
2002 


60/410,960 


Novel Polynucleotides Encoding Proteases and Novel 
Proteases. Encoded Thereby 


September 17, 
2002 


60/411,111 


Polynucleotides Encoding Polypeptides and 
Polypeptides Encoded Thereby 


September 17, 
2002 


60/411,052 


Novel Polynucleotides Encoding Proteins and Novel 
Proteins ^coded Thereby 


September 17, 
2002 


[002] 


Technical Field 
The present invention is related generally to novel 



polynucleotides and novel polypeptides encoded thereby, their compositions, 
antibodies directed thereto, and other agonists or antagonists thereto. The 
polynucleotides and polypeptides are useful in diagnostic, prophylactic, and 
therapeutic appUcations for a variety of diseases, disorders, syndromes and 



2 



wo 2004/020595 



PCTAJS2003/027107 



conditions, as well as in discovering new diagnostics, prophylactics, and therapeutics 
for such diseases, disorders, syndromes, and conditions hereinafter disorders). 

[003] This application fiirther relates to the field of polypq)tides that 

are associated wifli regulating cell growth and differentiation, that are ov^-expressed 
in cancer, and/or that can be associated with proliferation or inhibition of cancer 
growth, including hematopoietic cancers such as leukemias, lynophomas, and solid 
cancers such as lung cancer, for example, admocaicinomas and/or squamous cell 
carcinomas. These polypq>tides may also be associated with other conditions, such as 
inflammatory, immune, and metabolic disorders, as well as microbial infections, 
including viral, bacterial, fungal, and parasitic diseases, disorders, syndromes, or 
conditions. 

[004] This application further relates to modulators of biological 
activity that can specifically biiid to these polynucleotides or polypq)tides, or 
otherwise specifically modulate their activity. .For exanq)le, they can directiy or 
indirectly induce antibody-dependent cellular cytotoxicity (ADCC), conq>lement- 
dependent cytotoxicity (CDC), endocytosis, apoptosis, or reoiutment of other cells to 
effect cell activation, cell inactivation, cell growth or differentiation or inhibition 
thereof, and cell lolling. 

[005] The sequences of the invention enconq)ass a variety of different 
types of nucleic acids and polypeptides with differeiit structures and functions. They 
can encode or comprise polypeptides belonging to different protein famiUes ("Pfam")- 
The "Pfam" system is an organization of protein sequence classification and analysis, 
based on conserved protein domains; it can be pubUcly accessed in a nuinber of ways, 
for example, at http://pfam.wustl.edu. Protein domains are portions of proteins tibat 
have a tertiary structure and sometimes have enzymatic or binding activities; multiple 
domains can be connected by flexible polypeptide regions within a proteirL Pfam 
domains can comprise the N-terminus or the C-terminus of a protein, or can be 
situated at any point in between. The Pfam system identifies protein families based 
on these domains and provides an annotated, searchable database that classifies 
proteins into families (Bateman et al., 2002). 

[006] Sequences of the invention can encode or be comprised of more 
than one Pfam. Sequences encompassed by the invention include, but are not limited 
to, the polypeptide and polynucleotide sequences of the molecules shown in the 
Sequence Listing and corresponding molecular sequences found at all developmental 
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Stages of an organism. Sequences of the invention can comprise genes or gene 
segments designated by the Sequence Listing, and their gene products, i.e., RNA and 
polypeptides. They also include variants of those presented in the Sequence Listing 
that are present in the normal physiological state, e.g^, variant alleles such as SNPs, 
splice variants, as well as variants that are affected in pathological states, such as 
disease-related mutations or sequences with alterations that lead to pathology, and 
variants with conservative amino acid changes. Sequences of the invention are 
categorized below; any given sequence can belong to one or more than one category. 
Secreted Protein-Related Sequences 

[007] Secreted proteins, also referred, to as secreted factors, include proteins 
that are produced by cells and exported extracellularly, extracellular fragments of 
transmembrane proteins that are proteolytically cleaved, and extiacellular fragments 
of cell surface recq)tors, which fragments may be soluble. An exaniqple of a secreted 
protein is keiratinocyte growth factor (KGIO, wMch stimulates 
keratinocytes, and is useful for rq)airing tissue after chemother^y or radiotherapy. 

[008] Many and widely variant biological fimctions are mediated by a wide 
variety of different types of secreted proteins. Yet, despite the sequencing of the 
himian genome, relatively few pharmaceutically useful secreted proteins have been 
identified It woidd be advantageous to discover novel secreted proteins or 
polypq)tides, and their corresponding polynucleotides that have medical uti^ 

[009] Pharmaceutically useful secreted proteins of the present invention 
will have in common the ability to act as ligands for binding to receptors on cell 
surfaces in ligand/receptor interactions, to trigger certain intracellular responses, such 
as inducing signal transduction to activate cells or inhibit cellular activity, to induce 
cellular growth, proliferation, or differentiation, or to induce the productiori of other 
fectors that, in turn, mediate such activities. 

[010] The cell types having cell surface recq)tors responsive to secreted 
proteins are various, including, for example, stem cells; progenitor cells; and 
precursor cells and mature cells of the hematopoietic, hepatic, neural, lung, heart, 
thymic, splenic, epithelial, pancreatic, adipose, gastromtestinal, colonic, optic, 
olfactory, bone and musculoskeletal lineages. Further, the hematopoietic cells can be 
red blood cells or white blood cells, including cells of the B lymphocytic (B cell), T 
lymphocytic (T cell), dendritic, megakaryocytic, natural killer (NK), macrophagic, 
eosinophilic, and basophiUc lineages. The cell types responsive to secreted proteins 
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also include normal ceDs or cells implicated in disorders or other pathological 
conditions. 

[Oil] As an example, certain of the secreted proteins of the present 
invention can stimulate T or B cell growth or difFerentia:tion by interacting with 
precursor T or B cells or hematopoietic progenitor cells, or bone marrow stem cells. 
As another example, certain secreted proteins of the present invention can m ain t a in 
stem cells, progenitor cells or precursor cells in an undifferentiated state. As a further 
example, certain sedreted proteins of the present invention can regulate bone growth 
by stimulation or inhibition thereof, sea:eti6n of insulin, glucose metabolism, cell 
proliferation, response to microbial infection, and regeneration of tissues including 
neural, muscular, and epithelial. Moreover, certain secreted proteins of the present 
invention can induce apoptosis such as iii cancer cells or inflammatory cells; 

[012] Certain of the secreted proteins of flie present invention are useful for 
diagnosis, prophylaxis, or treatment of disorders, in subjects that are deficient in such 
secreted proteins or reqmre regeneration of certain tissues, the proliferation of which 
is dependent on such secreted proteins, or requires an inhibition or activation of 
growth that is dependent on such secreted proteins. Exainples of such disorders 
include cancer, such as bone cancer, brain tumors, breast and ovarian cancer^ BmidttTs 
lymphoma, chronic myeloid leukemia, colon cancer, endocrine system cancers, 
gastrointestinal cancers, gynecological cancers, head* and neck cancers, leukOTua,. 
lung cancer, lynq)homas, malignant melanoma, metastaseis, multiple endocrine 
neoplasia, myelomas, neurofibromatosis, pancreatic cancer, pediatric cancers,.penile 
cancer, prostate cancer, disorders related to the Ras oncogene, retinoblastoma (RB), 
sarcomas, skin cancers, testicular cancer, tiiyroid cancer, urinary tract cancers, and 
von Hippel-Lindau s^drome, 

[013] Certain ofthe secreted prpteiiis herein can be used for diagnosis, 
prophylaxis, and treatment of disorders of hematopoeisis, including thrombosis; 
bleeding; anemias, e.g., iron deficiency and other hypoproUferative anemias, 
megaloblastic anemias, hemolytic anemias, acute blood loss, and aplastic anemi^ 
hemoglobinopathies; disorders of granulocytes and monocytes; myelodysplasias and 
related bone marrow failure syndromes; polycythemias, e.g., polycythemia vera; acute 
and chronic myeloid leukemia, and other myeloproliferative diseases, e.g., 
malignancies of lymphoid cells; stimulation of replacement cell growth following 
irradiation or chemotherapy; and plasma cell disorders. 
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[014] Certain of the secreted proteins herein can be used for diagnosis, 
prophylaxis, and treatment of disorders of hemostasis, such as disord^ of the platelet 
and vessel wall, disorders of coagulation and thrombosis, and anticoagulant, 
fibrinolytic and antiplatelet iher^ies. 

[015] Certain of the secreted proteins herein can be used for diagnosis, 
prophylaxis, and treatment of disorders of the cardiovascular system including 
disorders of the heart, such as heart failure; congenital heart disease; ifaeumatic fever; 
. corpuhnonale; cardiomyopathies e.g., myocarditis; pericardial disease; cardiac 
tumors; cardiac manifestations of systemic diseases; and vascular diseases, such as 
acute myocardial in&rction, ischemic heart disease, hypertensive vascular disease, 
diseases of fhe aorta, and vascular diseases of the extremities. 

[016] Certain of the secreted proteins herein can be used for diagnosis, 
prophylaxis, and treatment of disorders of the respuatory system, such as asthma, 
hypersensitivity pneumonitis, e.g., with pulmonary infiltration, pneumonia, 
necrotizing pulmonary infections, bronchiectasis, cystic fibrosis, chronic bronchitis, 
emphysema and airway obstruction, interstitial lung diseases, primary pulmonary 
hypertension, pulmonary thromboembolism, disorders of the pleura, mediastinum, 
and diaphragm, disorders of ventilation, sleep apnea, and acute respiratory distress 
syndrome. 

[0 1 7] Certain of fhe secreted proteins herein can be used for diagnosis, 
prophylaxis, and treatment of disorders of fhe kidney and urinary tract, such as, for 
example, chronic renal failure and glomerulopathies. 

[0 1 8] Certain of the secreted proteins herein can be used for diagnosis, 
prophylaxis, and treatment of disorders of the gastrointestinal system, including 
disorders of fhe alimentary tract, such as, for example, peptic ulcer disease and related 
disorders, inflammatory bowel disease, irritable bowel syndrome; disorders of the 
liver and biliary tract, such as, for example, hyperbilirubinemias, acute viral hepatitis, 
chronic hepatitis, and cirrhosis; and disorders of the pancreas, such as acute or chronic 
pancreatitis. 

[019] Certain of the secreted proteins herein can be used for diagnosis, 
prophylaxis, and treatment of disorders of the immime system, connective tissue, and 
joints, including, for example, autoimmune diseases, primary immune deficiency 
diseases, hiunan immunodeficiency yirus diseases, allergies, systemic hxpus 
erythematosus, rheumatoid arthritis, systemic sclerosis, Sjogren's syndrome, 
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ankylosing spondylitis, reactive artbritis, vasculitis, sarcoidosis, amyloidosis, 
osteoarthritis, gout, psoriatic, and other arthritis. 

[020] Certain of the secreted proteins herein can be used for diagnosis, 
prophylaxis, and treatment of disorders of the endocrine system, including, for 
example, disorders of the pituitary, hypothalamus, neurohypophysis, thyroid gland, 
adrenal cortex, testes, ovary, and other organs of the female repixxiuctive system, such 
as breast; as well as pheochromocytoma, diabetes mellitus, and hypoglycemia. 

[021] Certain ofthe secreted proteins herein can be used for diiEignosis, 
prophylaxis, and treatment of disorders of bone and mineral metabolism, and other 
metabolic processes, including, for example, diseases of tiie parathyroid ^and and 
other hyper- and hypocalcemic disorders, osteoporosis, Paget's disease and other 
dysplasia of bone, disorders of lipoprotein metabolism, hemochromatosis, poiphyries, 
disorders of purine and pyrimidine metaboUsm, Wilson's disease, lysosomal storage 
diseases, glycogen storage diseases, lipodystrophies, and other primary disorders of 
adipose tissue. 

[022] Certain ofthe secreted proteins herein can be used for diagnosis, 
prophylaxis, and treatment of disorders of the central nervous system, including, for 
example, seizures and epilepsyi cerebrovascular diseases, Alzheimer's disease and 
other extrapyramidal disorders, ataxic disorders, amylotrophic lateral sclerosis and 
other motor neuron diseases, disorders of the autonomic nervous system, diseases of 
. the spinal cord, including spinal cord injury, primary and metastatic tumors ofthe 
liervous system, multiple sclerosis, and other demyelinating diseases, as well as 
chronic and recurrent meningitis. 

[023] Certain ofthe secreted proteins herein can be used for diagnosis, 
prophylaxis, and treatment of disorders of nerves or muscle, including, for example, 
GuiUain-Barre Syndrome, myasthenia gravis and other diseases ofthe neuromuscular 
junction, polymyositis, dermatomyositis, muscular dystrophies, and ottier muscle 
diseases. 

[024] Certain of the secreted proteins herein can be used for diagnosis, 
prophylaxis, and treatment of disorders of the skin, including, for example, eczema, 
psoriasis, cutaneous infections, acne, and other common skin disorders, and 
immunologically mediated skin diseases. 

[025] The agonists or antagonists of the secreted proteins herein or 
. fragments thereof can be useful in treating elevated levels of such proteins in ny of the 
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disorders above, and including angina, anoxia, airhythmias, asthma, atherosclerosis, 
benign prostatic hyperplasia, Buerger^ Disease, cardiac arrest, cardiogenic shock, 
cerebral trauma, Crohn^ Disease, congenital heart disease, mild congestive heart 
failure (CHF), severe congestive heart failure, cerebral ischemia, cerebral infarction, 
cerebral vasospasm, cirrhosis, diabetes, dilated cardiomyopathy, endotoxic shock, 
gastric mucosal damage, glaucoma, head injury, hemodialysis, hemorrhagic shock, 
hypertension (essential), hypertension (malignant), hypertension (pulmonary), 
hypertension (e.g., pulmonary, after bypass), hypoglycemia, inflanamatory arthritis, 
ischemic bowel disease, ischemic disease, male penile erectile dysfunction, maUgnant 
hemangioendothelioma, myocardial in&iction, myocardial ischemia, prenatal 
asphyxia, postoperative cardiac surgery, prostate cancer, preeclan^sia, Raynaudls 
Phenomenon, renal feilure (acute), renal failure (chronic), renal ischemia, restenosis, 
sepsis syndrome, subarachnoid hemorrhage (acute), surgical operations, status 
epilepticus, stroke (thiomboemboUc), stroke hemorrhagic), Takayasuls arteritis, 
ulcerative coUtis, ur^nia after hemodialysis, and uremia before hemodialysis. 

[026] Secreted proteins can be screened for ftmctional activities in 
£^propriate ftmctional assays, ais is conventional in the art Such assays include, for 
example, in vitro and in vivo assays for factors that stimulate the proliferation or 
differentiation of stem cells, progenitor cells, or precursor cells into T cells, B cells, 
pancreatic islet cells, bone cells, neuronal cells, etc. 

[027] The tetratricopeptide repeat (TPR) is an example of a protein domain 
characteristic of a protein family, and is present in some of the secreted polypeptides 
of the invention. The TPR family is characterized by a degenerate 34 amino acid 
sequence present in a wide variety of proteins; it mediates protein-protein interactions, 
and is involved in scaffold formation and the assembly of multiprotein complexes 
(ht1p://pfam.wustl.edu/cgi-bin/getdesc?name=TPR). Secreted protein-related 
. sequences can also possess or interact with cytochrome P450 domains, which are 
involved in the oxidative degradation of various compoxmds, including environmental 
toxins and mutagens (http://pfam.wustl.edu/cgi-bin/getdesc?name=p450). Secreted 
protein-related sequences, e.g., cholesteryl ester transfer protein and phospholipid 
transfer protein, can also possess or interact with the LBP/BPI/GETP domain, which 
is characteristically found in lipid-binding serum glycoproteins (http://pfam.wustl. 
edu/cgi-binygetdesc?name=LBP_BPI_CETP). Secreted protein-related sequences can 
also possess or interact with peptidase S8 domains, also known as subtilase domains, 
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which are comprised of sraine proteases with a wide range of peptidase activities, 
including exopeptidase, endopeptidase, oligopeptidase, and omega-peptidase activity 
(http://pfam.wustl.edu/cgi-bin/getdesc?name=Peptid^^ Secreted protein- 
related sequences can also possess or interact with adhjshort, or short-chain 
dehydrogenase domains, which are found in a large femily of proteins, and are made 
up of short-chain dehydrogenases and reductase OTzymes; most family members 
function as NAD- or NADP- depeadent oxidoreductases Oit^.//pfam.wustl.edu/cgi- 
bin/getdesc?name=adh_short)* 

[028] The inventors herein have identified novel secreted proteins using an 
algorithm that is constnicted on the basis of a number of attributes including 
hydrophobicity, two-dimensional structure, prediction of signal sequence cleavage 
site, and other parameters. Based on such algorithm, a sequence that has a secreted 
tree vote of 0.5 - 1.0, preferably, 0.6 - 1.0, is believed to be a secreted protein. 
Transmembrane Protdn-Related Sequences 

[029] Transmembrane proteins extend into or through the cell membrane's 
lipid bilayer; they can span flie membrane once, or more than once. Transmembrane 
proteins that span the membrane once are "single transmembrane proteins" (STM), 
and transmembrane proteins that span tiie membrane more than once are "multiple 
transmembrane proteins" (MTM). Examples of transmembrane proteins include the 
insulin receptor, adenylate cyclase, and intestinal brush border esterase. 

[030] A single transmembrane protein typically has one transmembrane 
(TM) domain, spanning a series of consecutive amino acid residues, numbered on the 
basis of distance firom tite N-temainus, with the first amino acid residue at the N- 
terminus as number 1. A multi-transmembraneprotem typically has more than one 
TM domain, each spanning a series of consecutive amino acid residues, numbered in 
the same way as the STM protein- 

[03 1] Transmembrane proteins, having part of their molecxiles on either 
side of the bilayei^, have many and widely variant biological fimctions. Hiey 
transport molecules, e.g., ions or protems across membranes, transduce signals across 
membranes, act as receptors, and function as antigens. Transmembrane proteins are 
often involved in cell signaling events; they can comprise signaling molecules, or can 
interact with signaling molecules. For example, tyrosine kinases can be 
transmembrane receptor proteins. Abnormalities of receptor tyrosine kinases are 
associated with human cancers; tumor cells are known to use receptor tyrosine kinases 
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in transduction pathways to achieve tumor growth, angiogenesis and metastasis. 
Therefore, receptor tyrosine kinases represent pivotal targets in cancer therapy. It 
would be similarly advantageoxis to discover novel transmembrane proteins or 
polypeptides, and their corresponding polynucleotides that have additional medical 
utility. 

[032] The transmembrane polypeptides of the invention, like the secreted 
polypeptides, also have many different functional domains, and belong to a wide 
variety of Pfam families. Transmembrane protein-related sequences can possess or 
interact with iimnunoglobulin (ig) domains, which are characteristically found in the 
immunoglobulin siqperfiunily, comprised of himdreids of proteins, wifli various 
functions (http://pfem,wustl.edu/cgi-bin/getdesc7name=ig). Transmembrane protein- 
related sequences can also possess or interact with ionjtrans domains, which are 
polypeptides characterized by six transmembrane helices, and which transport ions 
across membranes (htlp://pfanLWUstI.edu/cgi-bin/getdesc?name=4onJraas), Proteins 
in this fiimily can demonstrate specificity for particular ions, e.g., sodium, potassium, 
and calcium. Transmembrane protein-related sequences can also possess or interact 
with integrase core domains, which mediate the integration of a DNA copy of a viral 
genome into a host chromosome; e.g., fflV integrase catalyses the incorppration of 
viially derived DNA into the human genome, presenting a target for ttie development 
of new therapeutics for the treatment of AIDS (http://pfam.wustl:edu/cgi- 
bin/getdesc?name=rve). Transmembrane protein-related sequences can also possess 
or interact with domains designated as differentially expressed in neoplastic vs. 
normal cells "DENN" domains, which are involved in signal transduction. 
C3iaracteristically, these domains are foimd in protein conqponents of si gn a li n g 
pathways that utilize rab proteins or mitogen-activated protein (MAP) kinases 
(http://pfam.wustt.edu/cgi-bin/getdesc?name=DENN). 

[033] Transmmibrane protein-related sequences can also possess or interact 
with acyl coA binding protein (ACBP) domains, which are protein domains that bind 
medium- and long-chain acyl-CoA esters with high affinity (http://pfam,wusfl.edu/ 
cgi-bin/getdesc?name=ACBP). Membrane-related sequences also possess or interact 
with SPFH domaiD^and 7 family (Band_7) domain, which are protein domains that 
include a transmembrane segment, and regulate cation conductivity 
(http://pfam,wusti.edu/cgi-bin/getdesc?name=Band_7). 
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[034] Tiansmembrane proteins that are differentially expressed on the 
surface of cancer cells, particularly those that are differentially expressed on the 
surfece of cancer cells but not on the surface of normal tissues, such as heart and lung, 
are desirable targets for production of antibodies, e.g., diagnostic antibodies or 
therapeutic antibodies, such as antibodies that mediate ADCC or CDC to effect tumor 
cell killing. 

[035] Transmembrane proteins with extracellular fiagments that can be 
cleaved can be useful as secreted proteins to effect ligand/receptor binding so as to 
mediate intracellular responses, such as signal transduction. Transmembrane proteins 
that act as receptors, and possess a ligand binding exrtracellular portion e}q>osed on a 
cell surface and an intracellular portion that interacts with other cellular components 
upon activation can be also be useful as transmembrane proteins to mediate 
intracellular responses, such as signal transductioiL 
Kinase-Related Sequences 

[036] A kinase is an enzyme that catalyzes the transfer of phosphate groiq)S 
from phosphate donors to acceptor substrates. Kinase substrates include, but are not 
limited to, proteins and lipids. Sequences of the invention that phosphorylate protein 
substrates are designated "Pkinases;" Examples of kdnase-related sequences include 
calcium, calmodulin-dependent protem kinase U, myosin ligjit chain kinase, and . 
phosphatidlyinositol kinase. 

[037] Kinases and phosphatases are counteracting: kinases add phosphate 
groiq)s and phosphatases liberate phosphate groiqps. The counteracting activities of 
kinases and phosphatases provide cells with a "switch" that can turn on or turn off the 
function of various proteins. The activity of any protein regulated by phosphorylation 
depends on the balance, at any given time, between the activities of the kinase(s) that 
phosphorylate it, and the phosphatase(s) that dephosphorylate it. Phosphorylation 
plays a important role in intercellular communication during development, 
homeostasis, and the function of major bodily systems, including the immune system. 

[038] In conjunction with phosphatases, kinases control such diverse and 
essential cellular processes as transcription, cell division, cell cycle progression, 
differentiation, cytoskeletal function, apoptosis, recq)tor function, learning and 
memory, hematopoeisis, fertilization, neural transmission, mxiscle contraction, non- 
muscle motor function, glycogen metabolism, and hormone secretion. 
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[039] Most kmases act within a network of kinases and other signaling 
effectors, and are modulated by autophosphorylation and phosphorylation by other 
kinaseis (Manning et al., 2002). Intracellular signaling involves a multitude of diverse 
mechanisms that combine to modulate the activity of individual proteins in response 
to different biological inputs. 

[040] Defects in cell signal transduction pathways are responsible for a 
number of disorders, including the majority of cancers, immune disorders, and many 
inflammatory conditions, including, but not limited to, Ciohn% disease (Geffen and 
Man, 2002; Van Den Blink et al., 2002; Lodish 1999). Over-e^qpression and/or 
stmctural alteration of kinases, for example, receptor tyrosine kinase &mily memb^, 
is often associated with human cancers. For example, tumor cells are known to use 
receptor tyrosine kinases in transduction pathways to achieve tumor growth, 
angiogenesis and metastasis. Therefore, receptor tyrosine kinases represent pivotal 
targets in cancer therapy. A number of small molecule receptor tyrosine kinase 
inhibitors have been synthesized, are in clinical trials, are being analyzed in animal 
models, or have been marketed Inhibitory mechanisms include ligand-dependent 
down regulation, e.g., by the adaptor Cbl (Brunelleschi et al., 2002). 

[041] Kinase-related sequences can possess or interact with protein kfaiase 
(pkinase) domains, which share a conserved catalytic core common in 
serine/threonine and tyrosine protein kinases (ht^;//piam.wustl.edu/cgi- 
bixi/getdesc?name'=pkinase). Kinase-related sequences can also possess or interact 
with A-kinase anchoring proteiii 95 (AKAP95) domains, which comprise two zinc 
fingers, and have been implicated in chromosome condensation 0ittp://pfam.wustl. 
edu/cgi-bin/getdesc?name=AKAP95). Kinase-related sequences can also possess or 
internet with inositol 1,3,4,-trisphosphate 5/6 kinase (Insl34_P3 Jdn) domains, which 
mediate the function of inositol 1 .3.4-trisphosphate, a branch point in inositol 
phosphate metabolism (http://pfam.wustl.edu/cgi-bin/getdesc7name= lnsl34_P3 Jdn). 

[042] Kinases, by virtue of their participation in many and varied 
intracellular activities, are useful as targets of flier^eutic int^ention such as, for 
example, in cancer and inflammatiorL Cells transfected with cDNA encoding a kinase 
can be used in screening for small molecule agonists or antagonists, for example. . 
Ligase-Related Sequences 

[043] Ligases are enzymes that join together, or ligate, two molecules, 
Ligase substrates include nucleic acids and proteins. For example, DNA ligases link 
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two DNA molecules together; they play a role m DNA repair and replication. DNA 
ligases also are involved in the rearrangement of immimoglobulin gene segments, 
such as those responsible for the generation of antibody diversity. Exanoqples of 
protein ligases include ubiquitin protein Ugases, which add an ubiquitin molecule to 
an amino acid residue, typically as part of a peptide or polypeptide. Examples of 
nucleic acid Ugases include DNA Ugase I, DNA ligase EE alpha, and T4 RNA 
ligase 2. 

[044] Ligases are also involved in cellular regulatory processes. For 
example, glutamate-cysteine ligase (GCL) is the first and rate-limiting en2yme 
involved in the biosynthesis of glutathione. Polymorphisms of human GCL account 
for differences in sensitivity to environmental toxicants and chemotfaerapeutic agents 
in human cancer cell lines (Walsh et al., 2001). Also by way of example, glutamate- 
anmionia ligase, or glutamine synthetase (GS), is e>9ressed at a higher than normal 
level in human primary liver cancer, and may be involved in hepatocyte 
transformation (Christa et al., 1994). . 

[045] Ligase-related sequences can possess or interact with ATP dependent 
DNA ligase (DNA_ligase) domains, which can join two DNA fi:agments by 
catalyzing the formatioii of an intemucleotide ester bond between a phosphate and a 
deoxyribose (http'7/pfam.wustl. edu/cgi-bin/getdesc?name» DNA_ligase). Ligase- 
related sequences can also possess or interact wifli glutamate-cysteine ligase (GCS) 
domains, which catalyze the rate-limiting step in the biosynthesis of glutathione. 
(http://piiEim.wustl.edu/cgi-bin/getdesc?name==GCS). Ligase-related sequences can 
also possess or interact with 2',S^ RNA ligase (2_S_ligase) domaiiis, which ligate 
tRNA half molecules containing 2',3 '-cyclic phosphate and S'hydroxyl terminal to 
products containing a 25'phosphodiester linkage (ht(p*y/pfam. wustl.edu/cgi- 
bin/getdesc?name=2_5J[igase). 

[046] Like kinases, ligases are also useful as tai^ets for identification of 
agonists and antagonists, such as smaU molecule drugs. 

Receptor-Related Sequences (Including Nuclear Hormone and T-Cell Receptors) 

[047] A receptor is a polypeptide that binds to a specific signaling 
molecule and initiates a cellular response. Receptors can be present on the cell 
surface or inside the cell. Example of receptor types include G-protein-linked 
receptors, ion channel-linked receptors, enzyme-linked recq)tors, T-cell receptors, 
thyroid hormone receptors, retinoid receptors, nuclear hormone receptors, and the 
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related category of steroid hormone receptors, e.g., Cortisol receptors (Alberts et al., 
1994). 

[048] G-protein-linked receptors transduce extraceUular signals into 

intracellular responses by interacting with guanine nucleotide binding proteins. The 
same ligand can activate many different G-protein-linked receptors. G-protein-linked 
receptors mediate cellular responses to a diverse range of signaling molecules, 
including hormones, neurotransmitters, and local mediators, which are varied in 
stractuie and function, and encompass proteins and small peptides, as well as anaino 
acids and flieir derivatives, and fatty acids and their derivatives. Many signahng 
molecules are active at low concentrations, and their receptors often bind with high 
afBnity. Exainples of G-protein-linked receptors include, but are not limited to, 
ihodopsins, olfectory receptors, and p-adrenergic recq>tots. 

[049] Ion channel-linked receptors are involved in syn^tic signaling. 
These receptors regulate ion channels, to which they are linked Some respond to 
signals from neurotransmitters, e.g., acetylcholine, serotonin, GABA, and glycine, A 
common mechanism of action for ion channel-linked receptors is to transientiy open 
or close their respective ion channel, transientiy changing flie pOTneability of the 
membrane in which they reside to a specific ion or ions. 

[050] ^izyme-linked receptors can be linked to enzymes or can 
function as en2ymes. Their ligand binding site is commonly on one side of the 
membrane, e.g., an extracellular domain, and the catalytic site is on flie oflier, e.g., a 
cytoplasmic domain. Transmanbrane tyrosme-specific protein kinase receptors for 
growth and diflFeientiation fectors are enzyme-linked receptors; examples include 
receptors for epidermal growth factor (EGF), platelet-derived grovrth fector (PDGF), 
fibroblast growfli factors (FGFs), hepatocyte growtii factors (HGF), insulin, insulin 
like growtii factor-l (IGF-1), nerve growfli factor (NGF), vascular endoflieUal growfli 
factor (VEGF), and macrophage colony stimulating fector (M-CSF). 

[05 1] Nuclear hormone receptors generally function by crossing the 
plasma membrane of target cells and binding to intracellular protein ligands. Ligand 
binding activates these receptors in some instances, exposmg a DNA binding domain 
which regulates flie transcription of specific genes. Generally, nuclear hormone 
receptors bind to specific DNA sequences adjacent to or in the vicinity of flie genes 
regulated by their ligand, A host of cell type-specific regulatory proteins can 
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collaborate with the nuclear hormone receptor to influence the transcription of 
specific genes or sets of genes (Alberts et al., 1994). Examples of nuclear hormone 
receptors include estrogen-related receptors, such as hERRl, which modulates the 
estrogen receptor-mediated response of the lactoferrin gene promoter (Y ang et al., 
1996), and is a transcriptional regulator of the human mediiun chain acyl coenzyme A 
dehydrogenase gene (Sladek et al., 1997). Examples of nuclear hormone receptors 
also include photoreceptor-specific nuclear receptors, such as NR2E3, which are part 
of a laige family of nuclear receptor transcription factors involved in signaling 
pathways. NR2E3 plays a role in cone function and human retinal photoreceptor 
differentiation and degeneration ^iilam et al., 2002; Kobayashi et al., 1999). 

[052] T-cell receptors are membrane proteins comprised of two 
disulfide-linked polypeptide chains, each with two immunoglobulin-like domains. 
They display a similarity to antibodies in that tfaey have a variable apiino-tenninal 
region and a constant caxboxyl-tenninal region which is coded for by variable, 
joining, and constant region genes (Wei et al., 1997; Alberts et al., 1994). 
Rearrangement of T-cell receptor genes have been associated with human T-cell 
leukemias (Fisch et al., 1993). 

[053] Receptors are-involved in cellular processes that regulate growth 

and differentiation. Their dysregulation can lead to hyperproliferative conditions, and 
they are common dierq)eutic targets. For example, the EGF receptor is abeixantiy 
activated in neoplasia, especially in tumors of epitiielial origin. EGF receptor 
antagonists can successfully treat some of theise tumors, either alone or in 
combination with chemotherapy or idnizmg radiation (Kari et al., 2003). The 
progesterone recq)tor, an intracellular steroid hormone receptor, plays a role in the 
development and function of the mammary gland, the uterus, and the ovaiy. Mutation 
or aberrant expression of the progesterone receptor, or its regulatory molecules, can 
affect its normal function and lead to cancer (Gao and Nawaz, 2002). 

[054] Receptors are also involved in cellular processes that regulate 
inflammation and immunity. For example, members of the type 1 interleukin-1 
receptor ifamily mediate immune and inflammatory responses, and function in host 
defense. (O'Neill, 2002). Their activation can lead to the activation of signaling 
cascades, e.g., pathways involving transcription factors and protein kinases, resulting . 
in an inflammatory response (ONeill, 2002). Another mechanism by which receptors 
regulate inflammation and immunity is by their selective expression, at discrete stages 
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of differentiation, by cells involved in the inflammatory response. For example, 
expression of the triggering receptor e^qpressed on myeloid cells (TREM-1) and the 
myeloid DAP12-associating lectin (MDL-1) are correlated with myelomonocytic 
differentiatioA. These receptors are more highly e^qpressed in differentiated cells, are 
involved in monocyte activation and the inflammatory response, and are expressed at 
a lower level in malignant compared to normal cells (Gingras et al., 2002). 

. [OSS] Receptor-related sequences can possess or interact with seven 
transmembrane receptor (7tm_l) domains^ which are protein domains with a 
structural firamewoik comprising seven transmembrane helices found in receptors, 
e.g., receptors in the ihodopsin family with a wide range of functions, activated by 
ligands that vary widely in stmcture and character (http-y/pfam.wustl.edu/cgi- 
biii/getdesc?name==7tm_i). Receptor-related sequences can also possesfs or interact 
with LI transposable element (transposase_22) domains, some of viliich have been 
characterized to exhibit reverse transcriptase activity, and some of which are enable 
of retrotransposition. Receptor-related sequences can also possess or interact with a 
SH2 domain, which is a protein dpmain of about 100 amino acid residues found in 
many intracellular signal-transducing proteins, lhat can regulate intracellular signaling 
cascades by interacting with phosphotyrosine-containing target peptides in a 
sequence-specific and phosphorylation-dependent manner (http://pfam.wustl.edu/cgi- 
biD/getdesc?name=SH2). Receptor-related sequences can also possess or intact 
. with LDL receptor domains, e.g., the low-density lipoprotein receptor repeat class B 
(Ldl_recept_b) domain, which comprises a conserved YWTD motif in multiple 
tandem repeats (http://p&m.wustl.edu/ cgi-bin/getdesc?name=ldl_recept^b). 
Rec^tor-related sequences can also possess or interact with ribosomal L 1 0 ' 
(Ribosomal_L10e) domains, which are protein domains commonly found in the large 
ribosomal subunit (http://pfaiii.wiisfl.edu/cgi-bin/getdesc?name=Ribosomal_^ 

[056] Receptor-related sequences can possess or interact with zinc 
finger C4 type domains, which are DNA binding domains of nuclear hormone 
receptors fliat share a conserved cysteine-rich region of approximately 6S amino acids 
and regulate such diverse biological processes as pattem formation, cellular 
differentiation, and homeostasis (http://www.sanger.ac.uk/cgi-bin/Pfam/getacc? 
PF00105), Receptor-related sequences can also possess or interact with a ligand 
binding domain of nuclear hormone receptors (honnone_rec), which are helical 
domains involved in the regulation of eukaryotic gene expression, cellular 
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proliferation, and differentiation in target tissues (http://www.sanger.ac.uk/cgi- 
bin/Pfam/getacc?PF001 04). Receptor-related sequences can also possess or interact 
with Mov34 domains, which are regulatory subunits of flie proteasome found in some 
regulators of transcription factors (ht^:/Avw.sang^.ac.uk/cgi-bin/Pfkn/getacc? 
PF01398). Receptor-related sequences can also possess or interact with 
immunoglobulin domains, which are desoibed above. 

[057] Receptors, and fiagmentsofreceptors can be used as 
therapeutics. For exa^^)le, a ligand-binding portion, an eflFector-binding portion, and 
a kinase or phosphatase domain or consensus sequence can conqxrise firagments that 
can function as agonists or antagonists enhance or reduce, e.g., ligand binding to the 
natural receptors, or effector functioin by the natural receptors. 
Phosphatase-Related Sequences 

[058] A phosphatase, as indicated above, is an enzyme that catalyses die 
hydrolysis of esters of phosphoric acid. Its substrates include, but are not limited to, 
nucleic acids, proteins, and lipids. Together with kinases, phosphatases are active in a 
broad range of cellular functions, includiog transcription, cell division, cell-cycle 
progression, intermediate cellular metabolism, glycogen metabolism, lipogenesis and 
lipolysis, maintenance of electrochemical gradioits, neuronal function, immune 
responses, intracellular vesicular transport, cytoskeletal function, spemi motility, and 
skeletal, cardiac, and smooth muscle function (Oliver and Shenolikar, 1998). 

[059] Disruption in these functions may lead to disorders. For example, as 
noted above, phosphatases regulate pathways of cell growth and progranomed cell 
deatl^ diisruptions in these pathways can lead to abnormal cell growth, such as that 
which occurs in cancer. Mutations in serine/threonine protein phosphatase 2A 
(PP2A), a multifunctional regulator of cell growth and function, are associated witii 
the increased growth of tumor ceils (Schonthal, 2001). The tumor suppressor 
"phosphatase and tensin-homology deleted on chromosome 10" (PTEN) gene encodes 
PIP3, a lipid phosphatase that dephosphorylates phosphatidlyinositol, thus countering 
theactionoftheoncogenesPIa-kinaseandAkt, which promote cell survival. PTEN 
has been identified as a tumor suppressor; it is deleted in multiple types of advanced 
human cancers. 

[060] Also as noted above, phosphatases regulate pathways that control 
inmiune function. For example, the CD45 phosphotyrosine phosphatase is one of the 
most abundant glycoproteins expressed on immune cells, and regulates T-cell . 
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signaling and development (Alexander, 2000). In addition, the serine/threonine 
phosphatase calcineurin plays a central role in lymphocyte activation, among other 
important and wide-ranging ceiUular ftmctions (Baksh and BurakofiF, 2000). Certain 
compounds, specifically, cyclosporine and FK-506 (Tacrolimus), have been found to 
mhibit the phosphatase activity of calcineurin, thereby suppressing the production of 
IL-2 and other cytokines. In addition, these compounds have recently been found to 
block tiie JNK and p38 signaling pathways triggered by antigen recognition in T-<jells. 
Finally, phosphatase inhibitors have proven to be valuable as immime siq>pressant 
drugs, and those in the field believe that modulators of phosphatase activity promise 
to be important imrhunoregulatory compounds (Allison, 2000). 

[061] Phosphatase-related sequences can possess or interact with protein 
phosphatase 2C (PP2C) domains, which display Mn^ or Mg**^ dependent protein 
serine/tbreonine phosphatase activity (http://pfam.wu5tl.edu/cgi-bin/getdesc? 
nam^PP2C)« Phdspbatase-related sequences can also possess or interact with 
protein-tyrosine phosphatase (Y_phosphatase) domains, which catalyze fte removal 
of a phosphate groiqp attached to a tyrosine residue (ht^://pfem.wusfl,edu/cgi- 
bin/getdesc?name=Y_phosphatase). Phosphatase-related sequences can also possess 
or interact with protein phosphatase inhibitor l/DARPP-32 (DARPP-32) domains, 
which inhibit protein phosphatases, and play a role in regulating neurotransmitter 
pathways, receptors, and ion channels (htfp:/^fiuaLwustl.edu/cgi-bin/gietdesc? 
nameF=DARPP-32). 

[062] Like kinases, phosphatases can be used as targets for therapeutic 
intervention, in cell-firee or cell-based assays, for example, in screening for drugs, 
including small molecule drugs. 
Protease-Related Sequences 

[063] Proteases, also known as endopeptidases, are enzymes that cleave 
polypeptide chains by hydrolyzing peptide bonds at positions within the amino acid 
chain. Different proteases recognize difierent polypeptide sequences. Endopeptidase 
substrate q)ecificities vary fi:om broad to narrow; for example, subtilisins are 
relatively non-specific, . and can cleave polypeptide chains with a wide variety of 
amino acid sequences, whereas thrombin is more specific and can only cleave 
polypeptide chains with an arginine residue on the caiboxyl side of the susceptible 
peptide bond and glycine on the amino side. Additional exan^les of protease-related 
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sequences include collagenases, trypsin, and damage-induced neuronal endopeptidase 
(Kiryu-Seo et al., 2000). 

[064] Proteases mediate the continuous iCTiodeling of living tissues. For 
example, the extracellular matrix, a tissue skeleton that mediates communication 
among cells, and influences the structure and function of associated tissues and 
organs, is continuously remodeled. A strictiy controlled balance is maintained 
between breakdown of the extracellular matrix by proteases and reconstmction of the 
^tracellular matrix. This continued matrix remodeling is a dyxiamic process that 
shq)es tiie structure and function of tissues and organs (Wojtowicz-Praga, 1999). 

[065] Defects in protease function are responsible for a number of 
disorders, including cancer and other hyperproliferative disorders. Proteases are 
involved in the pathogenesis of such disord^ both by virtue of fheir involvement in 
programmed cell death and tumor invasion and metastasis (Los et al., 2003; Stetler- 
Stevenson et al., 1993). Detection of the presence or characteristics of proteases can 
be used to screen for and diagnose prostate cancer (Karanazanashvili and 
Abrahamsson, 2003). Proteases are also involved in the pathogenesis of 
inflammatory and arthritic diseases, such as pancreatitis, osteoarthritis, and 
rheumatoid arthritis (Pfutzer and Whitcomb, 2001; Martel-Pelleteir et al., 2001; Lerch 
and Goielick, 2000). 

[066] Protease-related sequences possess or interact wifli a variety of 
difierent protease domains, including domains belonging to the cysteine protease 
&mily, the serine protease family, and the metalloproteinase family 
(htlp://pfem.wustl.edu/cgi-bin/text search?tenns==endopeptidasetoearch_whal?= 
all&sections =DE&sections=CC&size^lO). 
Phosphodiesterase-Related Sequences 

[067] Phosphodiesterases are enzymes that cleave phosphodiester 
bonds, i.e., bonds formed by two hydroxyl groups in an ester linkage to the same 
phosphate group, such as those between adjacent RNA or DNA nucleotides. 
Phosphodiesterases are foimd in both soluble and membrane-associated forms. Most 
phosphodiesterases act within a network of signal transduction molecules and other 
signaling effectors, and are modulated by components of these pathways. 
Phosphodiesterases regulate the metabolism and synthesis of cyclic nucleotides in 
signal-transduction pathways. They hydrolyae cAMP and cGMP, molecules that play 
an important and widespread role in signal transduction. Phosphodiesterases also 
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repair damage to nucleic acids. Some phosphodiesterases are regulated primarily by 
calcium and calmodulin, others are regulated primarily by cGMP. They differ in their 
sensitivity to individual inhibitors, but all share a homologous catalytic region (Siegel, 
etal.,1999). 

[068] Exaiiq>les of phosphodiesterases include nucleotide 
pyrophosphatases (NPP) and plasma membrane glycoprotein PC-1, which are present 
in elevated levels in the fibroblasts of patients with Lowels syndrome (Funakoshi et 
al., 1992)- Another example of a phosphodiestoMe is myomegalin-^^ 
which is expressed at high levels in the nucleus and cytoplasm of heart and skeletal 
muscle (Soejima et al., 200 1). Phosphodiesterases have demonstrated promise m 
cancer chemother^y, analgesia, the treatment of Parkinson's disease, and the . 
treatment of learning and memory disorders (Weishaar, et al., 1985). 

[069] Phosphodiesterase-related sequences can possess or interact with 
type I phosphodiesterase/nucleotide pyrophosphatase (phosphodiest) domains, which 
catalyze the cleavage of phosphodiester and phosphosulfate bonds 
(h«p://www.sanger,ac.uk/cgi-bin/Pfam/getacc?PF01663). Phosphodiesterase-related 
sequences can also possess or interact with 35'-cycUc nucleotide phosphodiesterase 
- (PDEase) domains, which are involved in signal transduction (http-7/www.sanger.ac, 
uk/cgt-bin/Pfem/getacc?PF00233). 

[070] Phosphodiesterases (PDEs) are also useful as targets for 
therapeutic intervention, for example, for identification of agonists or antagonists, 
such as in the screening of small molecule inhibitors. A well known PDE-5 inhibitor, 
sildenafil citrate (Viagra®) is used for treatment of erectile dysfunction (Brock, 
2000). The mechanism of action involves inhibition of PD&-5 enzyme and resulting 
increase in cyclic guanosine monophosphate (cGMP) ^d smooth muscle relaxation in 
the penis (Rosen and McKenna, 2002). Such inhibitors may also find use for 
treatment of severe pulmonary arterial hypertension. (Ghofirani et al., 2003). 
Kinesin-Related Sequences 

[071] Cells transport protems and organelles in an orderly and 
regulated manner along cytoskeletal filaments. Molecular motor proteins, such as 
■ kinesins, can cany such cargo along the cytoskeletal filaments to specific 

destinations, in a highly regulated manner. Exemplary membrane-bound cargoes 
include mitochondria, lysosomes, endoplasmic reticulum, and axonal vesicles (Vale, 
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2003). Kinesins also transport nonmembranous cargo, such as mRNAs, tubulin 
monomers, and intemiediate filaments (Vale, 2003). 

[072] Kinesins, e.g., KIFll, function in the cell division process (Miki 
et al., 2001). In the nucleus, kinesins are necessary to establish spindle bipolarity, 
position chromosomes on metaphase plates, and maintain forces in the spindle. 
Several members of the kinesin family are associated with the chromosomes, and are 
likely to perform a role in mitotic chromosome movement (Miki et al., 2001). For 
exarople, the C-terminal kinesin KIFCl is involved in the processes of meiosis, 
mitosis, and karyogamy QAM et al., 2001). The kinesin GAKIN binds to the human 
analog of the Drosophila Discs Large tumor suppressor protein (hDlg), a membrane 
associated guanylate kinase (Hanadai, 2000). GAKIN undergoes translocation in T- 
lymphocytes upon their cellular activation (Hanada, 2000). The GAKIN/hDlg 
con^ilex is also hypothesized to play a role in cell division (Hanada, 2000). Thus, the 
kinesin GAKIN plays a role in cell proliferation and T-cell mediated imnaune 
function. • 

[073] Kinesin-mediatedintraceUulartnmq)ortisalsoimpUcatedinas^^ 
mechanism of tumorigenesis. For example, kmesin transports the tumor suppressor 
adenomatous polyposis colon protein (APQ (Jimbo et al., 2002). The APC gene is 
mutated in both sporadic and familial colorectal tumors. The APC protein interacts 
with the microtubule plus-end-directed kmesin proteins KIF3A and KIF3B through an 
association with the kinesin superfamily-associated protein 3 (KAP3). Normally, the 
APC tumor suppressor is transported to its correct intracellular location at the. tips of 
membrane protrusions. Mutant APCs derived fix)m cancer cells, however, are unable 
to undergo kinesin-mediated transport, and do not accumulate with normal efficiency 
in clusters in the membrane protrusions, and thereby can not function efficiently as 
tumor suppressors, 

[074] In view of the connection to cancer, investigators have sought 
small molecules to inhibit specific molecular motors in cells, such as the mitotic 
kinesin Eg5/Ksp (Mayer, 1 999). In addition, others have found small molecule 
inhibitors of Eg5/K^ with low nanomolar affinity have anti-tumor activity, and one 
such agent has entered clinical phase I trials (Vale, 2003). 

[075] In another arena, it has been proposed that impairing motor- 
driven deUvery of MHC peptide complexes to the surface of dendritic cells could 
provide immunomodulation. Additionally, inhibiting the cell surface delivery of 
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cytotoxic granules in T cells could help provide immunosuppressive therapy (Vale, 
2003). 

[076] Kinesm-related sequences can possess or interact with kinesin 
motor Odnesin) domains, which hydrolyze ATP and biud to microtubules to produce a 
motor-active force that transports intracellular vesicles and organelles 
(http://pfima.wustl.edu/cgi-bin/getdesc?iiame=ldm Kinesin-related sequences can 
also possess or interact with kinesin-assodated protein (KAP) domains, which are 
non-motive domains that form a complex with Idnesin Oittp://pfanLWustl.edu/cgi- 
bin/getdesc?name=KAP). Kinesin-related sequences can also possess or interact with 
MyTH4 domains, which are present in the tail of the motor ATPase proteins Idnesin 
and myosin (http://pi3rai.wustl.edu/cgi-bin/getdesc?name=MyTH^^^ 

[077] Kinesins, like kinases, are useful as targets for ther^utic 
intervention, for example, in screening for small molecule inhibitors for the treatment 
of cancer. 

Immanoglobiilin-Related Sequences 

[078] An immunoglobulin is an antibody molecule, and is typically 
coiiq)osed of heavy and light chains, each of which have constant regions that display 
similarity with other immunoglobulin molecules and variable regions that convey 
specificity to particular antigens.. Most immunoglobulins can be assigned to classes, 
e.g., IgG, IgM, IgA, I^, and IgD, based on antigenic determinants in the heavy chain 
constant region; each class plays a different role in the immune response. 

[079] Immunoglobulins are characterized by a structural motif, the 
inununoglobulin (ig) domain, which is approximately one hundred amino acids long, 
is involved in protein-protein and protein-ligand interactions, and includes a 
conserved intriadomain disulfide bond (http-y/pfem.wustt.edu/cgi-bu3/getdesc? 
name=ig). It is one of the most common domains found among all known proteins, 
and is present in hundreds of proteins with diverse fimctions. Proteins with the ig 
domain comprise the immunoglobulin superfamily; members include antibodies, T- 
cell receptors, major histocomptability proteins, the CD4, CDS, and CD28 co- 
receptors, most of the invariant polypeptide chains associated with B and T cell 
receptors, leukocyte Fc receptors, the giant muscle kinase titin, and receptor tyrosine 
kinases (Janeway et al., 2001; Alberts, et al., 1994). 

[080] Polypeptides with immunoglobulin-like domains can be markers for 
specific types of tissues and tumors. For example, a 43-kDa protein membrane 
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antigen with two immnnoglbbulin-like domains in its extiacellular region is expressed 
in nonnal human colonic and small bowel epithelium and > 95% of human colon 
cancels, but absent fix)m most other human tissues and tumor types (Heath et al., 
1997). 

[081] Polypeptides witti immunoglobulin-like domains are also involved in 
inflammatioiL For example, myelin oligodendrocyte glycoprotein, a myelin-specific 
protein found in the central nervous system, specifically binds to and acdvates 
. complement, an effector of the immune system, via its extracellular immunoglobulin- 
like domaia By virtue ofpn>viding the means for an interaction between myelin and 
the complement component of the immune response, myelin oligodendrocyte 
glycoprotein is a modulator of central nervous system inflammatian and has been 
predicted by those in the field to be relevant to the pathogenesis of demyelinating 
diseases such as multiple sclerosis (Johns and Barnard, 1997). 

[082] Immunoglobulin-related sequences can also possess or interact with 
leucine-rich repeat domains, which are involved in protein-protein interactions, and 
are used in molecular recognition processes as diverse as signal transduction, cell 
adhesion, cell development, DNA repak and RNA processmg 
Otop://pfam.wustl.edu/cgi-bin/getdesc?name =LRRNT): Immunoglobulin-related 
sequences can also possess or interact with fibronectin ^e IH repeat (fii3) domains 
(http://pfam.wustl.edu/cgi-bin/getdesc?name=fii3), which contain binding sites for 
DNA and hq)arin. Immunogilobulin-related sequences can also possess or interact 
with WASp Homology domain 1 (WHl), which can bind the metabotropic glutamate 
receptors mGluRlalpha and mGluR5 (htlp://pfam.wustLedu/cgi-bin/getdesc? 
nam©=WHl). 

Glycosylphosphatidylinositol Anchor-Related Sequences 

[083] Glycosylphosphatidylinositol (GPI) anchor proteins are 

synlfaesized as single membrane proteins; the transmembrane segment is cleaved 
away in the endoplasmic reticulum, where a GPI membrane anchor is added. The 
resulting protein is botind to the non-cytoplasmic, i.e., either extracellular or luminal, 
side of the membrane by the GPI anchor. GPI anchor proteins can be dissociated 
fix>m the membrane by phosphatidylinositol-inositol-specific phospholipase C 
(Alberts et al., 1994). Examples of GPI-anchor proteins include prefoldin, a 
chaperone that delivers unfolded proteins to cytosoUc chaperbnin (Vainborg et al.. 
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1 998), and caiboxypeptidase M, which is associated with the differentiatioii of 
monocytes to macrophages (Rehli et al., 1995). 

[084] GPI anchor protein-related sequences can possess or interact with 
KE2 domains, which may contain a DNA binding leucine zipper motif(httpy/www, 
sanger.ac.uk /cgi-bin/Pfen/getacc?PF01920). GPI anchor protein-related sequences 
can also possess or interact with zinc cafboxypeptidase (Zn_carbOpept) domains, 
which include caiboxypeptidase H regulatory domains and caiboxypeptidase A 
digestive domains Oittp://www.sanger.ac.uk/cgi4)m/P 
Oflier Polypeptide-Related Sequences 

Activator-Related Sequences 

[085] An activator is a molecule or collection of molecules that 

positively modulates tiie activity of a regulatory protein, or that binds to DNA and 
regulates one or more genes by increasing the rate of transcriptioiL Regulatory 
protein activators contribute to an increase in protein activity. Transcriptional 
activators provide a positive control over gene transcription; for example, they can 
sense the internal condition of the ceU and bind to a sequence of DNA near a targ 
promoter, resulting in flie transcription of an appropriate gene. Exanoples of activator- 
related sequences include template-activating factors, bacterial catabolite activators, 
and the coeirzyme thiamine pyrophosphatase. Activator-related sequences, e.g., 
factors that influence viral replication and transcription, can be encoded by oncogenes 
(Nagataetal., 1995). 

[086] Activator-related sequences can possess or interact with SH2 
domams, which are protein domains of about 100 amino add residues found in many 
signal-transducing proteins. SH2 domains can regulate signaling cascades, e.g., by 
interacting with phosphotyrosine-containing target peptides in a sequence-specific and 
phosphoiylation-dependent manner (http://i>fenLWUstt.edu/cgi-bin/getdesc? 
name=SH2). Activator-related sequences also possess or interact with nucleosome 
assembly protein (NAP) domains, which regulate gene expression, and are accessible 
to histones (http://pfam.wustl.edu/cgi- bin/getdesc?name==NAP). 

Adaptor-Related Sequences 

[087] Adaptors are proteins involved in the process of capturing . 

specific cargo molecules into membrane-bound vesicles for transport throu^ the cell. 
Different adaptors recognize different receptors for cargo molecules, and also 
recognize different vesicle coat proteins, accounting, in part, for the specificity of the 
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content of intracellular vesicles boimd to specific destinations within the cell (Kirsch 
et al., 1999). Examples of adaptor-related sequences include adaptins, clathnns, 
adaptor-related protein complex subunits, and Cas ligand with multiple Src homology 
3 domains (CMS) adaptors. 

[088] Adaptor-related sequences can possess or interact with src 
homology 3 (SH3) domains^ which are small protein mcklules of approximately 50 
amino acid residues found in a variety of intracellular or membrane-associated 
proteins. SH3 domainis are often indicative of a protein involved in signal 
transduction events related to cytoskeletal organization. (http://p&m.wusa.edu/cgi- 
bin/getdesc?name=SII3). Adaptor-related sequences also possess or interact with the 
adaptin N-terminal (Adaptin_N) protein domain, which is found in the N terminal 
region of various ads^tor protein conqilexes. The N-tetminal region of adaptor 
proteins is relatively constant in comparison to the C-terminal (http://p&m.wustL 
edu/cgi-bin/getdesc?name=Adaptin_N). 

Adhesion Molecule-Rdated Sequences 

[089] Adhesion molecules are molecules that mediate the adhesion of 

cells with other ceUs, and with ihe extracellular niatrix. Examples of adhesion 
molecules include members of the immunoglobulin sigjeriamily, integrins, cadherins, 
selectins, and transmembrane proteoglycans. The adhesion molecule 
carcinoembryonic antigen (CEA) is present nearly exclusively on cancer cells, and is 
expressed on the cell surface of s^roximately 80% of all. isolid cancerous tumors 
(Berinstein et al., 2002). 

[090] Adhesicm molecule-related sequences can possess or interact with 
the inununoglobulin (ig) domain, which are described above. Adhesion molecule- 
related sequences can also possess or interact with integrin alpha cytoplasmic region 
(integrin_A) domains, which comprise the short, intracellular region of the integrin 
alpha chain http://pfam.wustl.edu/cgi-bin/getdesc?name=integrin_A). 

Antigen-Related Sequences 

[091 ] An antigen is a molecule that provokes an immune response; they 
include both foreign antigens and autoantigens. Antigens can be expressod in a 
tissue-specific manner and their expression can be developmentally regulated For 
example, die heat stable antigen HS A is e;q)ressed in both a tissue-specific maimer, 
i.e., it is restricted to hematopoeitic cells, and a developmentally-regulated manner, 
i.e., it is more highly expressed in immature precursor cells than in terminally . 
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differentiated cells (Wenger et al., 1993). Antigens can be expressed on the cell 
surface or inside the cell, e.g., in the nucleus or on intermediate filaments. Antigen- 
related sequences include sequences related to tumor antigens, which are e3q)ressed 
exclusively in tumor cells, or in greater amounts in tumor cells than in normal cells. 
Tumor antigens can be transmembrane proteins, with one or more transmembrane 
domains (Li et al., 1996; Linnenbach, et al., 1993)- 

[092] Autoandgens, whidi are con:qx>nents of the body that provoke an 
incmnme response, are involved in die pathogenesis of autoimmune disease. 
Autoantigens can be either selectively or ubiquitously e^ipressed among cell and 
tissue types. They can be localized to any region of the cell, including the nucleus, 
nucleolus, nuclear envelope, and intermediate filamoits (Raoevskis et al., 1996). For 
example, pancreatic islet cell antigens are involved in ite autoimmune pathogenesis 
of diabeteS)^ and tiiyroid antigens are involved in autoimmune thyroid disease. 

[093] Antigpn-related sequences can possess or interact with the ICA{)69 
domain, which is characterized by a 69 kDa pancreatic islet cell autoantigen present in 
autoimmune (insulin-dependent) diabetes mellitus (ht^://p&nLwustl.edu/cgi- 
bin/getdesc?name=ICA69). Antigen-related sequences can also possess or internet 
with the Ku70/Ku80 C-temiinal arm (Ku_C) or Ku70/Kn80 N-tenninal alpha/beta 
(KuJN) domains, which belong to the Ku family of peptides (http-V/pfem-wustl. 
edu/cgi-bin/getdesc?name=Ku_C; http://pfem.wustL edu/cgi-bin/getdesc? 
naine=Ku_N). Ru, an antigen associated with autoimmune disease, normally 
functions to bind DNA double-strand breaks and facilitate DNA repair, but induces 
autoimmunity under pathological conditions. Antigen-related sequences can also 
possess or interact with the bZIP transcription fiictor (bZIP) domain, which comprises 
a basic region and a leucine zipper region (htfp://p&n.wusd.edu/cgi-bin/getdesc? 
name?=bZIP). Antigen-related sequences can possess or interact with YT521-B-like 
(YTH) domains, which comprise YT521-B, a tyrosine-phosphoiylated nuclear protein 
domain that modulates alternative RNA splice site selection, and interacts with other 
nuclear proteins, e.g., scaffold attachment factor B, and Sam68, a 68-kDa substrate 
associated with Src during mitosis (http://pfam.wustl.edu/ cgi-bin/getdesc?name= 
YTH). 

ATPase-Related Sequences 

[094] ATPases are enzymes that use the energy of ATP hydrolysis to 
move ions or small molecules across a membrane against a chenaical coiicentratioh 
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gradient or electrical potential. For example, ATPases can maintain low intracellular 
calcium and sodium ion concentrations, and generate a low pH inside lysosomes, 
plantK?ell vacuoles, and the lumen of the stomach. Vacuolar ATPases are ATP- 
dependent proton pumps that create pH gradients by transporting protons across 
membranes, while coi5)ling the energy produced in the conversion of ATP to ADP 
with proton transport (Foigac, 1 999). They can acidify or alkaiinisse cells, organelles, 
and extraceUular compartments, and create voltage gradients that 
or absorption ofions and fluids (Wiec2»reketal. 1999), Examples of ATPase-related 
sequences include proton transporters, glucose transporters, multidrug resistance 
fiictors, calcium ATPases, and ponns. 

[095] ATPase-related sequmces can possess or interact with ATP 
synthase F/14-kDa subunit (ATP-synt-F) domaios, which correspond to a 14-kDa 
subunit in the peripheral catalytic part of vacuolar ATPases (http:/^fenLWustl.edu/ 
cgi-bin/getdesc?name=ATP-synt_F), ATPase-related sequCTces can also possess or 
interact with vacuolar (H*>ATPase C, D, G, and H subunit (V-ATPase) domains, 
which are membrane-attached sequences that generate an acidic environment . 
(htip://pfam.wustl.edu/cgi-bin/getdesc?riame=V-ATPase_G). 

ATP-Related Sequences 

[096] Adenosine trisphosphate (ATP) is a nucleotide conqirismg an 
adenine, a ribose, and a trisphosphate unit. Hie trisphosphate unit contains two 
phosphoanhydride bonds that confer an energy-rich proi)erty to ATP. Tlie ftee energy 
liberated in the hydrolysis of one or both of these bonds can drive reactions that 
require ah input of free energy. A wide range of physiological and pathological 
processes are driven by the energy of ATP, including cellular movement, the 
synthesis of biomolecules from precursors, muscle contraction, ciliary and flagellar 
function, intennediary metabolism, glycolysis, fatty acid oxidation, oxidative 
phosphorylation, and membrane transport (Ku et al., 1990). Examples of ATP-related 
sequences include ATPases, ATP syntiiases, ATP canier proteins, and myosin. 

[097] ATP-related sequences can possess or interact with ATP- 

synthase subunit C protein domains (ATP-synt_C), which are protein domains that 
consist of two long terminal hydrophobic regions, and are implicated in the proton- 
conducting activity of ATPases G^^p V/pfam.wusti.edu/cgi-bin/getdesc?name=ATP- 
synt_C). ATP-related sequences can also possess or interact with, mitochondrial 
canier protein (mifo_carr) domains, which are iavolved in energy transfer across the 
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inner mitochondrial membrane (ht^://pfam.wustl.edu/cg;i-bin/getdesc? name= 
mito^cair). 

Binding Protein-Related Sequences 

[098] A binding protein is a protein that binds to another molecule with 
specificity. Binding proteins can be involved in building macromolecular sfructures, 
e.g., in cytoskeletal assembly or scaffolding (Machesky et al., 1997). Proteins often 
exist in the cell in complexes with other proteins, nucleic acids, lipids, and/or small 
molecules. For example, steroid receptors, e.g., flie progestm, estrogen, androgen, 
and glucocorticoid receptors, bind to heat-shock proteins and FKBP52, a calcium- 
regulated immunosiQ)pressant, to form functional complexes (Peattie et al., 1992; 
Sanchez et al., 1990). DNA binding proteins and general transection fectors bind to 
the TATA box, a consensus sequence in a genels promoter region that specifies the 
position of transcription initiation, forming a functional transcriptioii complex (Chalut 
et al., 1995). Proteins can interact with multiple molecules simultaneously. For 
example, Nedd4, an ubiquitin-protein ligase, can interact with multiple proteins and 
lipids Ihrough its Upid blading domain and multiple protein binding domains (Jolliffe 
etal,,2000). 

[099] Proteins utilize a large number of motife to bind other molecules. 
Binding protein-related sequences can possess or interact with the cold-shock DNA- 
binding (CSD) domain, a conserved domain of abput 70 anuno acids that helps the 
. cell survive in temperatures below optimum growth temperature by inducing the 
synthesis of proteins that negatively regulate transcription, translation, and 
recombination, resulting in suppressed cell proliferation (http:/^firai.wustl.edu/cgi- 
bin/getdesc?name=CSD). Proteins induced by exposure to cold include DNA-binding. 
proteins, and cold inducible RNA binding proteins, which have RNA binding 
domains at or near their N-tennini (Nishiyama et al., 1997). For example, contrin, a 
testis-specific DNA/RNA binding protein with a cold shock domain also has a large 
liumber of phosphorylation sites, each of which can mediate intermolecular 
interactions (Tekur et al., 1 999). Contrin is involved in transcription of testis-specific 
genes; its inactivation could provide a reversible male contracq)tive. 

[01 00] Binding protein-related sequences can possess or interact with flie 
ARID/BRIGHT DNA binding (ARID) domain, which is an approximately 100 amino 
acid sequence involved in a wide range of DNA interactions, including, but not 
limited to, interaction with AT-rich regions (http://pfam.wusti.edu/cgi-bin/getdesc? 
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nam^ARID). ARID-encoding genes are involved in a variety of biological 
processes, including regulation of cell growth, development, cell lineage gene 
regulation, cell cycle control, and tissue-specilBc gene egression. 

[0101] Binding protein-related sequences can also possess or interact with 
nucleosomal binding domains to iiacilitate binding within die nucleosome, a nuclear 
structure comprised of chromosomal DNA and proteins. For example, the HMG14 
andHMGl? (HMG14^17) domain is present in some nucleosome proteins, most 
commonly, ia proteins HMG14 and HMG17, members of a family designated as hig^ 
mobility group proteins, which form components of chromatin, and bind to 
nucleosomal DNA, regulating the interaction of the DNA with histone proteins 
(ht^:/^fem.wustl. edu/cgi-bin/getdesc? name=HMG14_l 7). 

[0102] Binding protein-related sequences can also possess or interact witii 
conserved motife that recognize RNA, and allow the protein to bind UNA 
(http:/^&m. wustl.edu/cgi-biD/textsearch?tenns==ma+ binding&search_wfaat== 
all&sections=DE&sections =<:;C&si2e=100). These motife include the RNA 
recognition (mn) domain, also known as a RRM, RBD, or RNP domain (httpJ/pfmL 
wustl.edu/cgi-bin/getdesc?name=nm). Numerous RNA binding proteins possess the 
mn domain, including heterogeneous nuclear ribonucleoproteins (hnRNP) proteins, 
which are implicated in the regulation of alternative splicing, and LA proteins, which 
are among the main autoantigens in systemic lupus erythematosus (SLE). 

[0103] Binding protein-related sequences can also possess or interact with 
conserved moti& that mediate their binding to ions, e.g., calcium. Calcium-binding 
proteins such as calmodulin, the caldneurins, and their homologues and related 
proteins are widely used to regulate cellular processes (http://pfam.wustl.edu/cgi- 
bin/textsearch?t^ins=calcium +binding& search_what=all&sections===DE&sections=^ 
CC&size=:100). Ion-binding, proteins include phosphoproteins that bind to ofher 
molecules in an maimer dependent on their phosphorylation state, and can regulate 
many types of molecules and processes, including those that utilize complex signaling 
cascades (Pang et al., 2001; Pang et al, 2002; Lin et al., 1999). Ion-binding protein- 
related sequences can possess or interact with the EF hand (efhand) domain, a 
calcium-binding domain that comprises a loop of twelve amino acids that coordinates 
a calcium ion in a pentagonal bipyramidal configuration and is flanked on both sides 
by a twelve amino acid alpha-heUcal domain (ht^://pfam.wustl. edu/cgi-bin/getdesc? 
name=efhand). 
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Breal^int-Related Sequences 

[0104] A breakpoint is the location on a chromosome where a gdie is 
disnqited, and one segment of the gene is severed from the oflier. Chromosomal 
breaks that disnq)t coding or regulatory sequences can result in gene mutation, 
Oiromosomal breaks can also serve as molecular landmarks, e.g., a break can be 
detected on Southern blots as flie loss of an expected tand and the qjpearance of two 
novel bands. Examples of breakpoint-related sequences include the sequences that 
generate the Philadelphia chromosome translocation, the sequences that generate ttie 
chromosome translocation (t(l;7Xq42;pl5)), which is impUcated in Wihns' tumor, 
and flie sequences that generate the chromosomal translocation t(18;21)(q22.1q^l3), 

which is inq)licated in Down syndrome. 

[0105] Breakpoints commonly occijr in discrete regions of flie : 
dmanosome. Breakage at these regions can lead to a recognized disease phenotype. 
One way of gen^ating such a phenotype is by chromosomal translocation, i.e., 
chromosomes mutate by ekchanging parts. When a segment ftom one chromosome is 
exchanged with a segment ftom anoflier nonhomologous chromosome, two nmtated 
chromosomes are simultaneously generated (Griffiths, et aL, 1999). The Philadelphia 
chromosome, a mutation sometimes associated with chronic myelogenous leukemia 
(CML), is an example. It results ftom the translocation of a discrete segment of 
chromosome 22 into a discrete region of chromosome 9. Patients with the 
Philadelphia chromosome mutation generally have a better prognosis than CML 

patients with other characteristics. 

[0106] Acquired clonal chromosomal abnormalities are found in the 

maUgnant cells of most patients with leukemia, lymphoma, and soUd tumors. Some 
of these abnormaUties are the result of consistent chromosomal rearrangements. For 
example, in a preponderant number of chronic myelogenous leukemia cases, 
breakpoints at chromosome band 22ql 1 occur within a breakpoint cluster region of 5- 

6 kb (Weinstein et al., 1988). 

[0107] Chromosome rearrangements affecting band 3q21 are associated 

with a particularly poor prognosis in myeloid leukemia or myelodysplasia. These 
breakpoints cluster in a breakpoint cluster region of approximately 30 kb, located 
centromeric and downstream of the ribophorin I (RPN-I) gene (Weiser, 2002). The 
apoptotic gene bcl.2, was isolated as.a brealqpoint rearrangement in human foUicular 
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lymphomas and was shown to act as an oncogene that promoted cell survival rather 
than cell proliferation. 

[0108] Some proteins can act as leukemia or lyraphoma-specific 
antigens for major histocompatibility complex-restricted T cell cytotoxicity. These 
include the breakpoint cluster region (bcr)-abl, and other fusion oncoproteins. 
Genetically engineered chimeric and humanized antibodies have demonstrated 
activity against overt lymphomas and leukemias. Radioimmunoflierapy has produced 
significant therapeutic responses with minimal radiation ejqposure to normal tissues 

(Juicicetal.,2000). 

[0109] Brealq)oint-related sequences can possess or interact with 

RhoGAP domains, also Imown as the brealgpoint cluster region-homology domain, 
and mediates signal transduction by small G proteins (ht(p://pfem.wustl.edu/cgi- 
bin/getdesc?name=RhoGAP). Breakpoint-related sequences can also possess or 
interact with RhoGEF domains, which comprise ^proximately 200 amino acid 
residues that encode a guanine nucleotide exchange fector (ht^://pfanLWUsti.edu/cgi- 
bin/getdesc?name=RhoGEF). Breakpoint-related sequences can also possess or 
interact with Plectin/SlO (SlOjplectin) domains, which are found at the N-terminus of 
some isoforms of plectin and ribosomal SIO protein Oittp://pfam.wustl.edu/cgi- 
bin/getdesc?nameHS 1 0_j)lectin). 

Carrier or Transport-Related Sequences 

[QUO] A membrane transport protein is an integral transmembrane protein 
that aids one or more molecules across a cell membrane. Most, if not all, types of 
molecules are transported across memlwranes, including proteins, ions, and fatty acids 
(SchaflfCT and Lodish, 1994). Even molecules such as water and urea, which can 
diffuse across pure phospholipid bilayers, are fiequenfly accelerated by transport 
proteins. Transporters clear cells of toxins, and confer drug resistance on tumor lines 
(Ramalho-Santos et al., 2002). The rate of transport varies considerably among 
membrane transport proteins. Membrane transport proteins function in the plasma 
membrane and in intracellular organellar membranes, including the nuclear, 
mitochondrial, lysosomal, and vesicular membranes. For example, transportin, also 
known as karyopherin beta2, imports nuclear mRNA binding proteins from the 
cytoplasm across the nuclear membrane, into the nucleus (Bonifaci et al., 1 997). 

[0111] Membrane transport proteins can have either a broad or a narrow 
range of specificity for the transported substance. In mammalian cells, nucleoside 



31 



wo 2004/020595 



PCTAJS2003/027107 



transport across membranes is mediated by broad specificity transporters. Nucleoside 
transport plays a role in such diverse cellular functions as nucleotide synthesis, 
neurotransmission, and platelet aggregatioiL Nucleoside transporters carry 
chemotherapeutic nucleosides, and are a target of interest in chemotherapeutic and 
cardiac drug design (Griffiths et al., 1997; Ku et al., 1990). 

[01 12] Carriers are another class of membrane transport proteins; they bind 
to a solute and transport it across the membrane by undergoing a series of . 
conformational changes. In contrast to channel proteins, transporters bind only one, 
or a few, substrate molecules at a time; after binding substrate molecules, they 
undergo a conformational change such tbat the bound substrate molecules, and only 
those molecules, are transported aax>ss the membrane. Carriers transport a wide 
variety of inolecules, including fiitty acids across the plasma membrane (Schaffer and 
Lodish, 1994); purines, pyrimidines, and components of nucleosides across the 
nuclear membrane, and adenine nucleotides across the inner mitochondrial membrane 
(Battinietal.,1997). 

[0113] Membrane transport-related sequences can possess or interact with 
vacuolar (H*>ATPase C, D, G, and H subunit (V-ATPase) domains, which are 
membrane-attached sequences that generate an acidic environment 
(http://pfam.wustLedu/cgi-bin/getdesc? name=V-ATPasejC). Membrane transport- 
related sequences can also possess or interact with nucleoside transporter 
(nucleoside_tran) domains, which are found in proteuis that transport nucleosides 
across the plasma membrane, and are employed to synthesize nucleotides via the 
salvage pathways in cells that lack their own de novo synthesis pathways 
(ht^p:/>^fktn.wuistl.edu^gi-bin/getdesc?name==Nucleoside_ Membrane transport- 
related sequences can also possess or interact with ATP synthase F/14-kDa subunit 
(ATP-synt-F) domains, which correspond to a 14-kDa subunit in the peripheral 
catalytic part of vacuolar ATPases (http://pfani.wustl.edu/cgi-bin/getdesc? 
name=ATP-synt_F), Membrane transport-related sequences can also possess or 
interact with mitochondrial carrier protein (naito^carr) domains, which are involved in 
energy transfer across the inner mitochondrial membrane (http://pfam.wustl.edu/cgi- 
bin/getdesc?name=mito_carr). Membrane transport-related sequences can also 
possess or interact with an AMP-binding enzyme (AMP-binding) domain, which is a 
domain rich in serine, threonine, and glycine, and is characterized by a conserved 
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proline-lysine-glycine triplet sequence (http://pfam.wustl.edu/cgi- 
bin/getdesc?name=AMP-bmding). 

[0114] Membrane transport proteins, such as those expressed in cancer cells, 
are useful as targets for therapeutic intervention, for example, in the screening for 
small molecule inhibitors. Inhibition of membrane transport, as indicated above, may 
make cancer cells more susceptible to chemotherapy, for example. 

Channel-Related Sequences 

[0115] Channel proteins tranqK)rt water or spedfic types of ions dow^ 
their concentration or electrical potential gradients. Tley form a protein-lined 
passageway across the membrane flirou^ which multiple water molecules or ions 
move at a very rapid rate, e.g., up to 10^ per second. The plaisma membrane, for 
example, contains potassium-specific channel proteins that generate the cell^ resting 
electric potential across the plasma membrane. Examples of channel-related 
sequences include the sodium hydrog^ exchanger, sodium potassium ATPase, and 
the cystic fibrosis transmembrane regulator. 

[0116] Members oftfais subset ofmembrane transport proteins have 
wide-ranging functions in both normal physiology and in pathology. For example, the 
transport system that mediates the transmembrane exchange of sodium for hydrogen 
across the plasma membrane plays a.physiological role in tiie regulation of 
intracellular pH, the control of cell growth and proliferation, stimulus-response 
coupling, metabolic responses to hormones, the regulation of cell volume, and the 
transepithelial absorption and secretion of several ions. The sodium-hydrogen 
exchanger also plays a role ini cancer and in tissue and organ hypertrophy 
^fahnensmith and Arorison, 1985). 

[01 17) Channel-related sequences can possess or interact with 
sodium/hydrogen exchanger (Na_HJExchanger) domains, which exchange sodium 
for hydrogen across a membrane in an electroneutral maimer (http://pfam.wustl. 
edu/cgi-bin/getdesc? name=Na_H_Exchanger). Channel-related sequences can also 
possess or interact with neurotransmitter-gated ion-channel ligand blading 
(Neur_chan_LBD) domains, which form the extracellular domains of some ion 
channels (http://pfam.wustl.edu/cgi-bin/getdesc?name=Neur_chan_LBD). Channel- 
related sequences can also possess or interact with UBX domains, which are present 
in ubiquitin-regulatory proteins (http://pfam, wustl.edu/ cgi-bin/getdesc?name=UBX). 
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Checlqpoint-Related Sequences 

[0118] The cell division cycle is the fimdamental means by which living 
things are propagated Fimdamental to successM propagation is the faithful 
replication of DNA; a cell cycle control system exists to coordinate the cycle as a 
whole. The control system is regulated by brakes that can stop the cycle at specific 
checlqK)ints. Thus, the checkpoints ancst flie cycle upon the occurr^ce of 
undesirable events, such as DNA damage, replication stress, or mitotic spindle 
disniptiorL For exansple, DNA lesions and disniptedrcpUcation forks are recognize 
by the DNA damage checkpoint and replication checkpoint, respectively. 
Checkpoints can also, for exanq)le, initiate protein kinase-based signal transduction 
cascades to activate downstream effectors ttiat elicit cell cycle arrest, DNA repair, or 
apoptosis. These actions prevent the conversion of aberrant DNA structures into 
inheritable mutations and minimize the survival of cells with unrepairable damage 
(Qin and Li, 2003). 

[01 19] Dysregulation of the cell-cycle is a hallmark of tumor cells. 
Defective checkpoint function results in genetic modifications that contribute to 
tumorigenesis. Checkpoint function can be abrogated by many different mechanisms 
. (Bast, et al., 2000). For example, cyclin-dependent kinases that normally are 
activated at a checkpoint can be inactivated or activated in an abnormal manner. 
Altenmtively, the noinaal activities of the cyclin-Kl^ndent ku^ 
phosphatases, or other regulatory molecules of the cell cycle can be altered. Tumor 
suppressors are among the classes of molecules that can effect cell cycle 
dysregulation. The abrogation ofcheckpoint function can alter the sensitivity of 
tumor cells to chemotheirapeutics (Stewart et al, 2003). 

[0120] Checkpoint-related sequenqes can possess or interact with 
phosphoribosylaminoimidazole-succinocaiboxamide synthase (SAICAR_synt) 
domains, which function in de novo purine synthesis (http://pfam.wustl.edu/cgi- 
bin/getdesc?name =SAICAR_synt). Checlq)oint-related sequences can also possess 
or interact with WP40 domains, which comprise a domain of approximately 40 amino 
acids, which are sometimes present in tandem repeats (http://pfam.wustl.edu/cgi- 
bin/getdesc?name=WD40). Checkpoint-related sequences can also possess or interact 
with cyclin, C-terminal (cyclin_C) domains, which regulate cyclin dependent kinases 
(http://pfanLWUStl.edu/cgi-bin/getdesc? name=cyclin_C). 
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[0121] Thus, checkpoint related proteins, e.g., kinases, phosphatases, 
etc., are useful as targets for tiierapeutic intervention, such as in screening for small 
molecule dnigs for the treatment of cancer, inmnune disorders, and inflanimation. 

Complex-Rdated Sequences 

[0122] Complsxss are molecular entities comprised of two or more 
compaaents. Molecular complexes widnn cells form ftmctionai units that cany out 
ceUular operations. For example, complexes at the ceU membrane perform structural 
and regulatory tasks, including regulating membrane traffic and m ain t ai n in g organeUe 
integrity. Complexes at the cytoskeleton perform static and <tynaniic roles wifli 
respect to cell shape, intracellular transport, and communication with the extraceUular 
matrix. Complexes in the nucleus transcribe and regulate genes, and complexes at 
sites of protein synthesis translate and r^ate proteins, Con^lexes can reside 
inUacellulariy and/or extracellularly, e.g., in tiie extracellular matrix. Examples of 
complex-related sequences include cytoskeletal and filamentous proteins^ ADP- 
ribosylation fector (ARF) proteins, and protein synthesis initiation fectors (Amor et 
al.,1994). 

[0123] Complex-related sequences can possess or intraactwidiADP- 
ribosylation factor femfly (arf) domains, which are GTP-bmding domains involved in 
protein trafficking (ht^://pfam.wusd.edu/cgi-bin/getdesc?name=^. Cranplex- 
related sequences can also possess or interact witii eukaryotic initiation fector 
domains, e.g., the eukaryotic initiation fector 4E (IF4E) domain, which recognizes 
and binds mRNA during protein synthesis (ht^://p&m.wusti.eda/cgirbin/getdesc? 
name=IF4E). Complex-related sequences can also possess or interact with 
intermediate filament (filament) protein domains, which form filamentous structures 
typically 8 to 14 nm wide, and form components of the cytoskeleton and nuclear 
raivelope, e.g!, neurofilaments, cytokeratins, lamins, vimemtin, and deanin 
Oittp://pfam.wustl.edu/cgi-bin/getdesc?name=filament). 
Cytokine-Related Sequences 

[01 24] A cytokine is an extracellular signaling protein or peptide fljat acts as 
a local mediator in communication among cells. Cytokines regulate proliferation and 
differentiation, for example, they mediate differentiation of ceUs in the hematopoeitic 
lineage. Examples of cytokines include interleukins, interferons, and colony 
stimulating fectors of the hematopoeitic system. Some cytokines, e.g., interferons and 
interleukins, can be induced by viral activity, and possess antiviral activity (Sheppard 
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et al., 2003). Cytokine-related sequences may enable the expression of a cytokme, for 
example, as a cytokine transcription factor (Kao et aL, 1994). They can also be part 
of a cytokine effector pathway, for example, as an intracellular effector of cytokine- 
lelated qtoskeletal changes in response to events in the extracellular matrix (Hirsh et 

al,, 2001; Joberty et aL, 1999). 

[0125] Cytokine-related sequences can possess or interact with interferoa- 
induced transmembrane protein (CD225) domains, M*ich are associated with 
interferon-induced cell growth svqrpression (http://pfem.wustl.edu/ cgi- 
bin/getdesc?name=CI>225). Cytokine-related sequiences can also possess or interact 
with SelR (SelR) domains, which bind both selenium and zinc, and/or methionine 
sulfoxide reductase eozymatic domains Oittpr/Zpfam. wustl.edu/cgi- 
bin/getdesc?name=SelR). Cytokine-related sequences can also possess or interact 
with reverse transcr5>tase (rvt) domains, which are involved in ENA-directed DNA 
polymerase activity, an enzymatic activity that uses an RNA ten^late to produce 
DNA for integration into a host genome (ht^://pfem.wusti.edu/cgi-bin/getdesc? 
name=Tvt). Cytokine-related sequences can also possess or interact with LI 
transposable element domains (Transposase_22), which are described above. 

[0126] Cytokines, thus, are useful as ther^)eutic proteins for the treatment of 
disorders such as cancer, immune disorders, and infl a mm ation. 

Ddiydrogenase-Related Sequences 

[0127] Dehydrogenases are aozymes that catalyze the rranoval of 
hydrogen atoms in the absence of o:Qrgen. They contribute to a wide range of 
emqonatic reactions, including those involved in amino acid degradation, amino acid 
synthesis, the citric acid cycle, fatty acid oxidation, fetty add synthesis, glycolysis, 
the pentose phosphate pathway, photosynthesis, pyravate oxidation, and oxidative 
phosphorylation (Walker et al., 1992). Examples of dehydrogenases include steroid 
dehydrogenases, NADH dehydrogenases, md glyceraldehyde-3-phosphate 
dehydrogenase. 

[0128] Dehydrogenase-related sequences can possess or intraact with 
glyceraldehyde 3-phosphate dehydrogenase, NAD binding (GPDH) domains, which 
play a role in glycolysis and gluconeogenesis by reversibly catalyzing the oxidation 
and phosphorylation of D-glyceraldehyde-3-phosphate to 1,3-diphospho-glycerate 
(ht^-7/pfem.wustl.edu/cgi-bin/getdesc?name=gpdh). Dehydrogenase-related 
sequences can also possess or interact with 3-hydroxyacyl-CoA dehydrogenase, NAD 
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binding (3HCDH_>I) domains, which catalyze the reduction of 3-hydroxyacyl-CoA to 
3-oxoacyl-CoA in fatty acid metabolism (http://pfam.wustl.edu/cgi-bin/getdesc? 
name=3HCDH^N). 

Disease-Related Sequence 

Amyotrophic Lateral Sclerosis 

[0129] Amyotrophic Lateral Sclerosis (Lou Gehrig's Disease) is a 
neurodegCTerative disease that affects the motor neurons. The disease displays 
multiple clinical variants and can affect motor neurons tiiroughout the nervous 
system, e;g., the spinal cord and brainstem. One clinical variant, the autosomal 
recessive form of juvenile amyotrophic lateral sclerosis, has been mapped to the 
human chromosome 2q33-q34 region (Hadano et al., 2001). A protein family 
characterized by the HAPl N-terminal conserved region (HAP1_N) domain possesses 
a N-tenninal conserved region from hypothetical protein products of ALS2CR3 genes 
found in the 2q33-2q34 region of chromosome 2 (htfp.//pfam.wustl.edu/cgi- 
bin/getdesc?name= HAP1_N). 

Gaucher's Disease 

[0130] Gaucherls Disease is a genetic disease characterized by a deficiency 
of en^ones responsible for the breakdown and recycling of glycolipids, i.e., lipids 
with carbohydrate moieties, e.g., glucosylceramide; and sphingolipids, lipids with 
sphingosine moieties, e.g., sphingomyelin. Normally, the glycolipids and 
sphingolipids in the membranes of senescent cells are metabolized by a multi-step 
process that includes Ifae activities of acid beta-glucosidases and s^>psins. When 
these activities are absent, or present in reduced amounts, glucosylceramide and 
sphingolipids accumulate, and produce the Gaucher^ disease phenotype. The disease 
displays multiple clinical variants, and can manifest with central nervous system 
pafliology, enlargement of organs, e.g., liver and spleen, and an increase in the level 
of the cytokine transforming growth &ctor beta (Zhao and Grabowski, 2002;.Percz 
Calvo et al., 2000; Connand et al., 1997). The variability in clinical presOTtation is 
consistent with the large number of different mutations observed in the acid beta- 
glucosidase and saposin genes. 

[0131] Add beta-glucosidases are enzymes that metabolize glycolipids. 
Saposins are small proteins that are described in more detail below. Mammalian 
saposins are synthesized as a single precursor molecule (prosaposki) with saposin-A 
(SAP A) and saposin-B (SapB_l; SapB_2) domains; prosaposin becomes an active 
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s^osin. following a proteolytic activation reaction (ht^://pfam.wustl.edu/cgi- 
bin/getdesc?name=SAPA; http://pfem.wustl. edu/cgi«-bin/getdesc?name=SapB_l; 
http://^fimi.wustl.edu/cgi-biiygetdesc?iiame=S^^ 
Huntington Disease 

[01 32] Huntington Disease is a progressive neurodegenerative genetic 
disorder characterized by demmtia, psychiatric symptoms, and a choriform 
movement disorder. It is caused by an increased number of repeats of the codon 
CAG, which encodes the amino add gjutamine, m a gene located at the 4p 1 6.3 region 
of chromosome 4, which codes for a protein called huntiiigtin. The polyglutamine 
tracts ejqpressed by the mutant form of the gene selectively ablite striatal and cortical 
neurons, (Ho et al., 2001). 

[0133] The HimtiiigtonE)isease gene is widely expressed, but exerts tissue- 
specific effects on neurons (Un et al., 1993). The gene expresses multiple distinct 
transcripts, and diflFerential polyadenylation of the gene leads to the expression of 
transcripts of different sizes (Un et al., 1993). There is a relative mcrease in the 
abundance of one traiiscript in the human brain, which has been hypothesized to 
account for the tissue-specific effects of the disease (Lm et al., 1993), The HAPl JN 
protein domain, described above, binds to ttie gene product^ huntingtin, in a 
polyglutamine repeat-lmgth-dependent marmer (http://pfittn.wustl.edu/cgi- 
bin/getdesc?name=HAPlJN). This domain is also found in several huntingtin- 
associated protein 1 (HAPl) homologues. 

Multiple Sclerosis (MS) 

[0134] Multiple sclerosis (MS) is a disease characterized by demyelination, 
i.e., the loss of the myelin coating, of nerve axons. Its clinical course varies among 
patients; these variations fidl into two broad categories, a relq)sing/remitting course, 
and a chronic progressive course. MS has a complex etiology; it has an autoimmune 
component, is influenced by genetics, and sometimes involves infectious agents. MS 
results fi:om an abnormal immune response to one or more antigens present in the 
myelin sheaths that cover the nerve axons of genetically sxisceptible individuals, 
which may be preceded by exposure to a causal infectious agent (Oksenberg et al., 
1999). 

[0135] The genetic suscq)tibility to MS is determined by MS susceptibility 
genes, most of which demonstrate only a small to moderate effect on susceptibility, 
e.g., the major histocompatibility complex at chromosome 6p21 (Oksenberg et al.. 
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1999). An etiological infectioxis agent has been isolated from the plasma and . 
cerebrospinal fluid of patients with multiple sclerosis (Perron et al., 1997). This agent 
is a retroviral oncovirus, known as multiple sclerosis-associated retrovirus (MSRV), 
also called LM7, and is found in association with virions produced by the cultured 
cells of MS patients (Perron et al., 1997). MSRV proteins possess protein domains 
characteristic of retroviral proteins. These include the Gag P30 core sheU protein 
(Gag_p30) domain, which is involved in viral assembly (http://pfam.wustl.edu/cgi- 
bin/getdesc?name=Gagjp30) and the reverse transcriptase (rvt) domain, which was 
described above. 
Obesity 

[0136] Although single-g^emutatioiish^ve been shoym to cause obesity in 
animal models, the most common forms of human obesity arise from the interactions 
of multiple genes, environmental &ctors, and behavior. Several genes have been 
shown to affect body weight regulation in humans atiH nthftr aniTnak These include 
the ob, lep, CPE, ASIP, LEP, TUB, UPC, POMC, CCKAR, TNFA, and PPAR-y 
genes (Comuzzie et al., 1998). Genetic regulation of body weight can be effected 
througih diverse mechanisms. For examjple, the TUB gene &mly regulates body 
— wei^t by encoding proteins that are phosphorylated in response to insulin, mediate 
insulin signaling, and are associated with a maturity onset obesity associated with 
insulin resistance (Ikeda et al., 2002). CCKAR genes regulate body wd^t m a 
different marmei^ they regulate the hormone cholecystoldnin, which produces a 
feeling of satiety following food intake (Ritter et al., 1994). 

[0137] Some genes that regulate body weight possess the WHl domain, 
which is described above. Goies that regulate body weight can also possess or 
interact with the sprouty (sprouty) domain. This domain is found in sprouty proteins, 
which inhibit the Ras/mitogen-activated protein kinase cascade, a pathway initiated 
by receptor tyrosine kinases and involved in development (http://pfam.wustl.edu/cgi- 
bin/getdesc7name=Sprouty). Genes that regulate body weight can also possess or 
interact wifli a Tub (Tub) domain, which is found in Tubby, a mouse gene in which an 
autosomal recessive mutation resulting from a splicing defect causes maturity-onset 
obesity, insulin resistance and sensory deficits (http://pfam.wusd.edu/cgi- 
bin/getdesc?name=Tub). 
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Oncogene 

[0138] An oncogene is any one of a large number of genes that can Kelp 
make a cell cancerous. Typically, an oncogene is a mutant form of a normal gene, 
and is often a gpne involved in the control of cell growfli, division, or differentiation- 
Cells in higher organisms normally grow, divide, differentiate, and die under the 
regulation of other cells. Cancer cells proliferate, in part, because they are able to 
divide without input fiom other cells, as the result of accumulated mutations. 
Oncogenes include, but are not limited to, genes encoding OTP binding proteins, e.g., 
ras\ growth factors, e.g., platelet-derived growth factor, growth factor receptors, e.g., 
platelet-derived growth factor receptor; kinases, e.g., src; nuclear proteins, e.g., myc\ 
and tumor suppressors, e.g., retinoblastoma proteins. 

[0139] The prwiucts of oncogenes are ftequently proteins involved in cell 
signaling, e.g., kinases, GTP-binding proteins, and receptors. For example, many 
human cancers have a mutation in a ras gqne (Alberts et al., 1994). The ras prbtems 
belong to a large siq>€rfamily of monomeric GTPases, and relay signals from receptor 
tyrosine kinases to the nucleus, stimulating cell proliferation or differentiation. Ras 
proteins function as switches, cycling between an active state in which OTP is bound, 
and an inactive state, in which GDP is bound. A ras gpne mutation can result in the 
translation of a protein that feils to hydrolyze its bound GTP, and persists abnormally 
in its active state, transmitting an intracellular signal for cell proliferation or 
differentiation even m the presence of regulatory non-proliferation and non- 
differentiation signals. Oncogene-related proteins can poissess one of many ras 
protem domams (http://pfam.wusfl.edu/cgi-bin/textsearch?terrns=^^ 
what=all&sections=DE &sections=CC&si23e=100), mcluding the sub-families Ras, 
Rab, Rac, Ral, Ran, Rap, and Yptl . Oncogene-related proteins can also possess a 
Gtrl/RagA G-protein conserved region (gtrl_RagA) domain, which is found in some 
G-proteins of tiie Ras family, e.g., the RagA/B human homologues of the ras GTP 
binding protein Gtrl (http://pfanLWusa.edu/cgi-bin/getdesc?name=Gtrl_RagA). 
Oncogene-related sequences can also possess or interact with an ATPase domain 
associated witii diverse cellular activities; proteins witix the AAA CATPases 
'Afeociated with diverse cellular !A'ctivities) domain can perform chaperone-like 
functions that assist in assembling, operating, or disajssembling protein complexes. 
The domain includes a conserved region of approximately 220 amino acids that 
contains an ATP-binding site which can act as an ATP-dependent protein clamp to 
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hold a protein in place (http://pfiam,wustt.edu/cgi-bin/getdesc?nanie=AAA). Some 
oncogene-related sequences can also possess or interact with a C2 domain of 
approximately 116 amino-acid residues, which can be involved in calcium-^iependent 
phospholipid binding and inositol-! ,3,4,5-tetraphosphate binding, and is found, e.g., 
in some isozymes of protein kinase C (ht^://pfam.wusfl.ed^^ 
bin/getdesc?name=C2). C2 domains are typically located between CI domains 
(which bind phoibol esters and diacylglycerol) and protein kinase catalytic domains. 
Regions with homology to the C2 domain are present in many proteins, e.g,, 
synaptotagmin. 

Parkinson 's Disease 
[0140] Paridnsonls disease is a neurological disorder that affects movement 
control. Complex interactions among groiq)S of nerve cells in the central nervous 
system coordinate to control movement One such group of neurons is located in the 
substantia nigra of the midbrain; these neurons release the neurotransmitter dopamine, 
which allows an organism to fine-tune its movemCTits. In Parkinson^ disease, neurons 
of the substantia nigra progressively degenerate, leaving the patient with clinical 
symptoms that may include resting tremor, muscular rigidity, a slowness of 
spontaneous movement, and poor balance and motor coordination.(Seigel et al., 
1999). 

[0141] Parkinson^ disease has multiple causes, including both genes and the 
environment It also has multiple presentations, mcluding juvenile-onset (before agp 
45) and adult onset (after age 45), and can be transmitted through either autosomal 
dominant or autosomal recessive mechanisms. In keeping with ttie diversity of 
etiologies, presentation, and genetic mechanisms, there are a large and diverse number 
of genes and gene products involved in the pathogenesis of Parkinsonis disease. For 
exanq)le, the PARK2 gene, wMch encodes the protein parkin, is mutant in autosomal 

recessive juvenile parkinsonism: PARK2 is a ubiquitm protein ligase that is a 
component in flie pathway fliat attaches ubiquitin to specific protems, designating 
them for degradation (Fishman, and Oyler, 2002). 

[0142] Parkinson's disease-related sequences can possess or interact with 
synuclein domains, which are expressed on the cytoplasmic regions of protems found 
predominantly in neurons (http://pfanLWUstl.edu/cgi-bin/getdesc?name=Synuclein). 
Alpha-synuclein, which possesses a synuclein domain, is mutated, in several families 
with autosomal dominant Parkinson^ disease. Gamma-synuclein, which also 
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possesses a synuclein domain, is overexpressed in breast and ovarian cancers 
(Lavedan, 1998). 

Retinitis Pigmentosa 
[0143] Retimtis pigmentosa is a group of inherited retinopathies 

characterized by early stage loss of night vision, followed by loss of peripheral vision. 
Defects in any structural or functional proteins associated with the rod photoreceptor 
neurons of the retina, whidi are tiie cells that transduce ligjit into a neuronal action 
potential, can lead to the disease (Seigel et aL, 1999). 

[0144] GTPase regulators have been inqjUcated in the pathology of retinitis 

pigmentosa. GTPase regulators are proteins that determine whether a GTP binding 
protein exists in a OTP-bound or GDP-bound state (Zhao et al., 2003); fliey are 
described in more detail below, GTPase regulators have a broad spectrum of 
intracenularfunctiom,includuigintraoeUular vesicular transpo^ These proteins 
localize to a spedficregionof rod photoreceptor ceUs, in a narrow ciliumfliat 
connects the cell body, where protein synthesis and basic metabolism takes place, 
with the rod outw segment, where Ught iis transduced to an action potential of the 
optic nerve (Zhao et al., 2003). Proteins necessary for the Kgjit transduction process 
are made in the cell body and must be transported to flie outer segment via vesicular 
transport mechanisms. Mutant GTPase regiaators,\rfuch regulate vesicular transp 
play a role in the pathogenesis of retinitis pigmentosa (Rbepman et al,, 2000). 
Retinitis pigmentosa-rekted sequences can possess or interact with a Tctex-1 domain, 
which is comprised of a dynein light chain, and can bind to the cytoplasmic tail of 
rhodopsins, which are Ught-sensmg proteins present in retinal rod cells 
(http://pfem.wusfl. edu/cgi-bin/getdesc?niaHne=Tctex-l). Mutations in this domain 
that are responsible for retinitis pigmentosa inhibit this binding. 

Alzheimer's Disease 
[0145] Alrfieimer's disease is a neurodegenerative dementing illness. It is a 

genetically complex disease with multiple forms, including femilial and sporadic 
forms, and early onset and late-onset forms. Mutations in at least four geiies are 
known to cause Alzheimer^ disease, and there is evidence for additional Alzheimer's 
loci (McKusick, 2003). One fonn of Alzheimer^ disease is caused by mutations in 
the amyloid precursor gene, another form is associated with the apolipoprotein E4 
allele, a third form is caused by a mutant presenilin-l gene that encodes a seven- 
transmembrane domain protein, and a fourtih form is caused by a mutant gene 
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encoding a similar seven -transmembrane domain protein, presenilin-2 (McKusick, 
2003). 

[0146] Consistent with its multiple etiologies, multiple clinical presentations, 
and multiple genetic loci, Alzheimer disease has a complex pathology. One facet of 
the pathology of Alzheimer*s disease is the formation of amyloid plaques fix>m 
amyloid precursor protein (Cladc and Karlawish, 2003). Amyloid precursor protem 
can be processed in vitro by several different proteases such as secretases and 
caspases to yield peptide fragments, suggesting that these proteases may play a role in 
the formation of pafliogenic amyloid plaques in vivo (Suh and Checler, 2002). 
Presenilins have been identified as likely candidates for the proteases that cleave 
amyloid precursor protein to pathogenic peptide fragments in vivo (Selkoe, 2001). 
Another fecet of Alzheimer^ disease pathology is ah inflammatory component 
mediated by microgUal cells, the bramis primary immunoeflfector cells (Tan et al., 
1999). Microglial cells are attracted to and activated by amyloid dq)ositsj they 
release inflammatory mediators that promote the aggregation of the deposits into 
plaques, and also directly induce or promote neurodegroeration (Hoozei^ 
2002). Therefore, current treatment strategies include antirinflammatory and 
immunother^eutic approaches, including vaccines (Weiner and Selkoe, 2002). 

[0147] Alzheimer'fe disease-related sequences can possess or interact wifli 
trypsin domains, which demonstrate a wide range of peptide degradii^ activities, 
mcluding exopq>tidase, endopeptidase, oligopeptidase and omega-peptidase activities 
(http://pfam. wustl.edu/cgi-bin/getdesc?name==ttypsm). Ahheuner^ disease-related 
sequences can also possess or interact vidth low-density lipoprotein receptor (ldl_r6ce) 
domains, which are characterizsed by seven successive cysteme-rich repeats of about 
40 amino acids at the N-tennmal region, and which are also present in receptors for 
low density lipoprotein (LDL), the major cholesterol-carrying lipoprotein of plasma 
(http://pfmn.v^aisfl.edu/cgi-bm/textsearch?terms=ldl^^ +&search_what=all& 
sections =DE&sections<;C&siz©=100). Alzheimer^ disease-related sequences can 
also poss^ or mteract with a PT repeat (pt_a) domain, which mclud^ 
tetrapeptide XPTX, or a similar, consCTved, sequence. 
WilliamS'Beuren Syndrome 

[0148] WilUams-Beuren syndrome is a complex genetic developmental 
disorder with multisystemic manifestations, and variability in its presentation. In 90- 
95% of the cases reported, a gene deletion occurs at the 7ql 1 .23 location on the long 
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arm of chromosome 7; in the remaining cases, a vmety of other chromosomal 
deletions and translocations have been observed (Wang et al., 1999), The most severe 
cases are characterized by cardiac anomalies, including aortic stenosis, mental 
retardation, growth deficiency, a characteristic facial appearance, dental 
malformation, and infantile hypercalcemia (Lashkari et al., 1999). 

[0149] The imderlying molecular basis for the syndrome is the absence of 
the proteins encoded by the genes of the affected region of the chromosome. A 
nussing elastin gene, with resulting extracelhilar matrix anomalies, is a consistent 
finding. Other genes fliat are present in and near ttie commonly deleted region of 
cbomosome 7, and thus are likely to contribute to pathogenesis, are (1) a gene 
encoding a regulator of chromosome condensation-like G-exchanging fitctor, which is 
a factor that exchanges nucleotides for small GTP-bmding proteins, (2) an N- : 
acetylgalactosaminyltransferase, (3) a DNAJ-like chaperone, (4) NOLl/NOP2/sun 
donotain-containing proteins, including a novel protein designated WBSCR20, which 
is expressed in skeletal muscle, and is similar to a 120 kilodalton proliferation- 
associated nucleolar antigen, (5) a methyltransferase designated WBSCR22, and (6) 
other proteins with no known homologies (Merla et al., 2002; Doll and Grzeschik, 
2001). Williams-Beuren-related sequences can possess or interact with a GTF2I-like 
repeat (GTF2I) domain, which is a DNA binding domain commonly deleted in 
Williams-Beuren syndrome, (http:/>^fem.wustl,edu/cgi-*in/getdesc7nameKjTF2I). 

Rheumatic Diseases 

[0150] Rheumatic diseases are inflammatory conditions that can have 
autoimmune, infective, or traumatic origins. They include arthritis, systemic lupos 
erythematosiis, scleix)derma, and Sjogren^ syndrome. Arthritis refers to any 
inflammation of a joint Systemic lupus erythematosus is an autoimmune disease in 
which patients produce antibodies to flieir own tissues, resulting in an inflammatory 
process that can damage organs. Scleroderma can present as systemic scleroderma, a 
chronic, progressive disease ftat is characterized by hardening and stiffening of the 
skin and damage to internal organs, e.g., heart, lungs, kidneys and esophagus. 
Sjogren's syndrome is a progressive immunological disorder characterized by 
inflammation and the subsequent destruction of exocrine glands, e.g., salivary glands, 
sweat glands, and lacrimal (tear) glands. 

[0151] The serum of patients with scleroderma and Sjogren^ syndrome have 
antibodies directed against a protein that is a normal component of the Golgi 
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apparatus (Seelig et al., 1 994), an intracellular organelle composed of a stack of 
flattened cistemae with associated transport vesicles. The Golgi apparatus sorts 
proteins and sends them to their correct intracellular destination. This antigenic 
protein is a "golgin," one of a class of molecules characterized by an integral 
membrane domain and a large cytoplasmic region- Golgins organize the Golgi^ 
structure, and influence protein sorting (Gillingham et al., 2002). Golgins fimction in 
a variety of ways, including cross-bridging Golgi cistemae to one another (linstedt 
and Hauri, 1993) and tethering Golgi transport vesicles to the cisternal membranes 
(Shorter et al., 2002). Rheumatic disease-associated sequences can possess or interact 
with golgin-97, RanBP2alpha, Imhlp, and p230/golgin (GRIP) domains, whidi are 
found in many large coiled-coil proteins, are suffidmt for tai^eting to the Golgi, and 
have a conserved tyrosine residue (http://pfam.wustl.edu/cgi-bin/getdesc? 
nameKjRIP). 

Disint^rin-Related Sequences 

[0152] Disintegrins are proteins that interfere with the fimction of 
integrins. Disintegrins are generally proteins of about 70 amino acid residues fliat 
contam multiple disulfide bonds, bind with high affinity to a subset of integrins, and 
interfere with mtegrin binding to physiological ligands. Examples of disintegrin- 
related sequences include snake venoms and related proteins, cysteine-rich 
metalloproteinases and related non-enzymatic sequences, e.g., those expressed in the 
male reproductive tract, and membrane-anchored metalloproteinases wifli diverse 
fimctions, e.g., the shedding of cell-surface proteins such as cytokines and cytokine 
receptors, and the conferring of asthma susceptibility (Van Eerdewe^ et al., 2002; 
Perry etal,, 1995). ' 

[0153] Disintegrin-related sequences can possess or interact with 
disintegrin domains, which contain an Arg-Gly-Asp sequence, a sequence commonly 
found m adhesion proteins (http://pfam.wusd.edu/cgi-bin/getdesc?iiame==disintegrin). 
Proteins that comprise both disintegrin and metalloproteinase peptidase domains 
include ADAM proteins. Disintegrin-related sequences can also possess or interact 
with reprolysin family propeptide (Pep_Mi2B_propep) domains, \i^ch are domains 
that include the propeptide sequence of members of the peptidase family M12B, and 
contain a sequence motif similar to a sequence found in matrixin proteins 
(htlpi/Zpfam. wustl.edu/ cgi-bin/getdesc?name=Pep_M12B_propep). 

Factor-Related Sequences 
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[01 54] A factor is any molecule that contributes to a bodily process. Factors 
can function in specific biochemical reactions and cellular functions. There are many 
categories of fectors, and fectors are involved in many, if not all, physiological and 
pathological processes. Some exemplary factors are described in the following 
paiagiqjhs; they are not exhaustive of the category. 

[0155] Transcrq»tion fectors are factors that initiate or regulate transcription 

in eukaiyotes. They include gene regulatory proteins, which turn specific sets of 
genes on or off, and general transcription fiictors, whidi assemble at the promoter 
region to enable and regulate transcription of many genes. They also include 
transcription elongation fectors, whidi are proteins reqpiired for the addition of amino 
adds to growing polypq?tide chains on libosomes (Alberts et al., 1994). 
Transcription fectors interact with a wide variety of molecules, including DNA 
bhiding proteins, polymerases, regulatory molecules such as kinases, and specific 

regions of DNA, e.g., promoters, and enhancers (Alberts et al., 1994; Vallejo et aL, 
1993). 

[0156] Translation fectors, including translation initiatibn fectors and release 
fectors, are involved m mitiating and regulating the rate of protein synfeesis. They 
also interact with many molecules, including ribosomal protems, mRNA, and 
molecules that regulate the mcoiporation of amino adds mto protem, such as kinases 
and GTP (Price et aL, 1993; Albats, 1994). 

[0157] Bqwrt fectors are involved in the ejqwrt of molecules, e.g.,RNA, 

fiom the nucleus (State et al., 2000). Foldmg fectors are involved m the process of 
folding protems into their functional three dunensional shapes, and are also mvolved 
m receptor fimction (Gao et aL, 1994). Factras such as activators and coactivators 
mteract with nuclear receptors to modulate cellular processes, e.g., transcription 
(Mahajan et al., 2002). 

[01 58] ADP-ribosylatiwi fectors are involved in the addition of an ADP- 
ribose group donated fiom nicotinamide adenine dinucleotide (NAD) to speaSc 
amino acid residues in heterotrimeric G-proteins. They are involved in, for example, 
normal cellular processes, such as vesicular transport, and also in the pathologic states 
induced by cholera, pertussis, and botuhnum toxins (Alberts et al., 1994; Amor et al., 
1994). Guanine nucleotide exchange factors bind to small G-proteins, such as Ras, 
and displace GDP in favor of GTP. They act as effectors or modulators of small G- 
proteins (Ehrhardt et al., 2001; Janeway et aL, 2001; Shao and Andres, 2000). 
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[0159] Factor-related sequences can possess or interact with ADP- 
ribosylation fector family (arf) domains, which are GTP-binding domains involved in 
protein trafficking (http://pfam.vsaistl.edu/cgi-bin/ Factor-related 
sequences can also possess or interact with elongation factor Tu OTP binding 
(GTP_EFTU) domains, which are elongation factors that promote the GTP-dependent 
binding of aminoacyl IRNA to ribosomes during protein biosynthesis, and catalyze 
the translocation of the newly synthesised protem chain Oittp://pfem.wustl.edu/cgi- 
bin/getdesc?name=GTP_EFTU), Factor-related sequences can also possess or 
interact with 4F5 protein family (4F5 ) domains, which conqprise ubiquitously 
expressed short proteins rich in aspartate, glutamate, lysine and arginine 
^ttpi//pfenLwustl.edu/cgi-bin/getdesc?name==4F5). Factor-related sequences can also 
possess or interact with eukaiyotic initiation fectors, e.g,, eukaryotic initiation factor 
4E (IF4E), which recognizes and binds mKNA during an early step of protein 
synthesis (hti4)://pfam.wustl.edu/cgi-bin/getdesc?name==IF4E). 

Germ Cdl Specific Protein-Related Sequences 

[0160] Germ cells, also called gametes, are cells that contribute to a 

new generation oforganisms by giving rise to either an egg or a sperm. They are 
- haploid cells specialized for sexual fusion. Proteins that are specific to germ cells can 
be found at one or more developmental stages of gametes. 

[0161] Germ cell-related sequences include germ cell genes and their 
gene products, their regulators and effectors, genes and gene products affected m 
disorders associated with germ cells, and antibodies that ^ecifically recognize or 
modulate germ cell-related siequences. iExamples of genn cell-related sequences 
include the genn cell-specific Y-box binding protein and contrin. Germ cell specific 
protein-related sequences possess or interact with the cold-shock DNA-binding (CSD) 
domain, which is described above. 

Growth Factor-Related Sequences 

[0 1 62] A growth factor is an extracellular polypeptide signaling molecule 
that stimulates a cell to grow or proliferate. Many types of growtii fectors exist^ 
including protein hormones and steroid hormones. Some growth fectors have a broad 
specificity, and some have a narrow specificity. Examples of growtii factors witii 
bioad specificity mclude platelet-derived growth factor, epidermal growtii fector, 
insulin like growtii factor I, transforming growtii factor P, and fibroblast growtii 
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factor, which act on many classes of cells. Examples of growth fectors with naxrpw 
specificity include erythropoeitin, which induces proliferation of precursors of red 
blood cells, interleukm-2, which stimulates proliferation of activated T-lymphocytes, 
interleukin-3, which stimulates proliferation and survival of various types of blood 
cell precursors, and nerve growth factor, which promotes the survival and the 
outgrowth of nerve processes fix)m specific classes of netirons. 

[0163] Most growth factors have other actions in addition to inducing cell 
growth or proliferation, e.g., they may influence survival, differentiation, migration, 
or other cellular fimctions. Growth factors can have coniplex effects on their targets, 
e.g,, they may act on some cells to stimulate cell division, and on others to inhibit it. 
They may stimulate growth at one concentration, and inhibit it an another. Growfli 
factors are also involved in tumorogenesis. 

[0164] Growth fector related sequences include sequences associated with 
the process of stimulatimg cell growth or proliferation by a growth factor. For 
exanq)le, they include intracellular effectors of growth, such as componrats of 
intracellular pathways that respond to growth factors (Kothapalli et al,, 1997; Wax et 
al., 1994), sequOTces that bind direcfly or indirectly to growth factors (Van den 
Berghe et al., 2000), and sequences affected as a result of growth fiictor action. 

[0 1 65] Growth fector-related sequences can possess or interact with a 
transforming growth factor beta like (TGF-beta) domain, which is a multifunctional 
peptide sequence that controls proliferation, differentiation and other functions in 
many cell types (http://pfitm. wustLedu/cgi-biii/getdesc?name=TGF^beta). Growth 
factor-related sequences can also possess or interact with a fibroblast growth factor 
(FGF) domain, which is found in a femily of proteins involved in growth and 
differentiation ^ttp://pfam.wustl.edu/cgi-bin/getdesc? name=FGF). 

GTPase-Related Sequences 

[01 66] GTPases are enzymes that catalyze GTP hydrolysis, and 
comprise a large family of proteins with a similar globular GTP binding domain. 
When GTP is bound to a GTPase, it is hydrolyzed to GDP, and flie domain undergoes 
a conformational change that inactivates the protein. GTPases are regulated by 
GTPase regulators, proteins that determine whether a GTP binding protein exists in a 
GTP-bound or GDP-bound state (Zhao et al,, 2003). GTPase regulators include 
GTPase activating proteins, which bind the GTPase and induce it to hydrolyze its 
bound GTP to GDP; the GTPase remains in an inactive, GDP-bound state until it 
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encounters a guanine nucleotide releasing protein, \^cli binds to the GTPase and 
causes the release of the nucleotide. GTPases have a broad spectrum of intracellular 
functions, including intracellular vesicular transport Examples of GTPase-related 
sequences include ras, GTPase-activating proteins, and guanine nucleotide releasing 
proteins. 

[0167] GTPase-related sequences can possess or interact with GTPase 
activator protein for Ras-like GTPase (RasGAP) domains, which are protein domains 
of about 250 residues that accelOTte the GTPase activity of ras 
(http*7/pfani.wustl.edu/cgi4}iii/getdesc?name==RasGAP), GTPase-related sequences 
can also possess or interact with putative GTPase activating protein for ARF (AriGap) 
domains, which are protein domains with a zinc finger involved in intennolecular 
associations (ht^://pfam.wustl.edu/cgi-bin/getdesc?name=ArfG^). GTPase-related 
sequences can also possess or interact with ankyrin repeat domains (ank), which are 
tandemly repeated modules of about 33 amino acids found in a yancty of functionally 
diverse proteins (http://pfam.wusfl.edu/cgi-bin/getdesc?nameF=ank). GTPase-related 
sequences can also possess or interact with pleckstrin homology (PH) domains, which 
are protein domains of about 100 residues involved in intracellular signaling, or as 
components of the cytoskeleton (http://pfam.wusfl.edu/cgi-bin/getdesc?na^ 

Heat-Shock Protein-Rdated Sequences 

[0168] Heat-shock proteins, also referred to as stress-response proteins, are 
proteins that are synthesized in response to an elevated tenqierature or other cell 
stressor, and help the cell withstand environmiental insults. A cell stressor can induce 
a battery of genes that encode gene products that protect tiie cell from the result of the 
insult, e.g., proteins that stabilize and tepsar partially doiatured cell proteins. Some 
heat-shock proteins, e.g., ch^>erones, are present at high levels in unstressed cells, 
and fuxifa^ induced by stress. Chaperones assist other proteins in attaining their 
proper secondary and tertiary structures. For example, members of the tubulin- * 
specific chaperone A family possess tubulin-specific chaperone A (TBCA) domains 
that fold tubulin polypeptides into their functional configuration 
(http://pfam.wustl.edu/cgi-bin/getdesc?name=TBCA). 

[0 1 69] Heat and other stressors further induce the synthesis of a family of 
90-kDa heat-shock proteins that are already abundant in unstressed cells (Pepin et al, 
2001;Lees-Milleretal,, 1989;Rebbeetal., 1987). Members of this family possess a 
hsp 90 protein (HSP90) domain that interacts with tubulin, actm, tyrosine kinase 
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oncogene products of retroviruses, eIF2alpha kinase, and steroid hormone receptors 
(Lees-Miller and Anderson, 1989). This domain includes a highly-conserved N- 
terminal region, separated from a conserved, acidic C-tenninal region by a highly- 
acidic, flexible linker region (http://pfem. wustl.edu/cgi-bin/getdesc?name=HSP90). 

[0170] Another family of heat-shock proteins, the hsp70 proteins, have an 
average molecular weigjit of 70 kDa; some members of this family are only expressed 
undCT conditions of stress, while some are present in cells under normal conditions. 
. Hsp70 proteins reside in different cellular compaitments, e.&, thie nucleus, cytosol, 
mitochondria, and endoplasmic reticulum. Hsp70 protems, e.g., Hsc73, can be 
differentially expressed at different stages of development (Soulier et al., 1996). 
Hsp70 proteins, e.g., the ch^erone hsp70-like dnaK protein, can associate with 
proteins that possess a DnaJ domain, which comprises an N-terminal conserved 
doniam of about 70 amino acids, a glycine-rich region of about 30 amino adds, a 
central domain containing four rqjeate of a CXXCTGXG motif, and a C-tennind 
region of 120 to 170 amino acids (http://pfem.wusti.edu/cgi-bin/getdesc? 
name=DnaJ). Proteins with DnaJ domains can be postranslationally modified by 
femesylation (Andres et al., 1997). 

— Helicase^Related Sequences 

[0171] Helicases are enzymes that use energy from the hydrolysis of 

ATP to unwind tiie DN A hehx at tiie repUcation fork, allowing the singje stands to be 
copied. Pit)teins wifli DNA helicase activity play roles in DNA replication, repair, 
and recombinatioiL Disorders associated with helicases include Xeroderma 
pigihentosum^ Cockayne syndrome, diffuse collagen disease, alpha-thalassemia. 
Bloom syndrome, Werner syndrome, and Rotonund-Thomson syndrome (Miyajima, 
2002). Examples ofhelicases include RNA helicases, RECQL4, and 
minichromosome maintenance heUcase. 

[01 72] Helicase-related sequences can possess or interact with helicase 
associated (HA) domains, which are protein domains comprising alpha helices fliat 
may bind to nucleic acids (ht^://pfam.wusti,edu/cgi-bin/getdesc?name=HA). 
Helicase-related sequences can also possess or interact with helicase conserved C- 
terminal (helicase_C) domains, which are protein domains that are found in a subset 
ofhelicases designated tiie DEAD/H helicases (http://pfam.wusti.edu/ cgi- 
bin/getdesc?name=helicase_C). 
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Hydrolase-Related Sequences 

[0173] Hydrolases aie enzymes that catalyze the hydrolysis of a variety 
of bonds, such as esters, glycosides, and peptides. Hydrolases split a molecule into 
fragments by adding water, the waterls hydrogen atom is incorporated m^^ 
fiagment, and the hydroxyl group is incorporated into another. Hydrolases are 
involved in a wide range of physiological and pathological processes, including 
proteolysis, phosphatase activity, and sugar metabolism. Examples of hydrolases 
include protein hydrolases, lipid hydrolases, nucleic acid hydrolases, and small 
molecule, e.g., coenzyme A, hydrolases (Hawes et al., 1996). 

[0174] Hydrx)lase-related sequences can possess or interact with 

alpha/beta hydrolase fold (abhydrolase) domains, which are catalytic domains found 
m a wide range of hydrolytic enzymes of different phylogenetic origins and c^ytic 
functions (http://pftoLwustl.edu/cgi-bin/getdesc?name==abhydroto^ Hydrolase- 
rclated sequences can also possess or interact with dUTPase domains, vAdch are 
protems domains that hydrolyze dUTP to dUMP and pyrophosphate. 

Immune Cell-Related Sequences 

[0175] An immune cell is a cell involved in, or associated with, the immune 
system. Immune cells include cells in the myeloid and lymphocytic arms of ttie 
immune response, as well as their precursors. Immune cells also include cells at all 
stages in the differentiation pa&ways that produce cells associated wifli the immune 
system. Hiese cells can reside, either permanently or temporarily, in the spleen, 
lymph nodes or mucosal-associated lyn5)hoid tissues (MALT). Immune cell-related 
sequences are involved in all functions of the immune response, e.g., antibody 
production and cell-mediated immunity, and can function at any point in time, ran g i ng 
fiom the embryonic formation of the immune system, through the time of an iminune 
challenge, to many decades later, e.g., when a B-cell memory response is invoked 
(Janeway, 2001). 

[0 1 76] Immune-cell related sequences of differentiating immune cells 
include pre-B cells that do not produce immunoglobulin light chain, but express a 
transcript homologous to immunoglobulin lambda light-chain genes, the expression of 
which is limited to pre-B cells and select other cells that have no surface 
immunoglobulin (HoUis et al., 1989). Immune-cell related sequences of activated 
immune cells include a B-cell-restricted transcription factor expressed by activated B 
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cells; its expression pattern suggests it has a role in regulating B-cell differentiation 

(Massarietal.,1998). 

[0177] Bxainination pfthe expression ofimmune-<:eU related sequences can 

detect and diagnose inuniinoregulatory abnormalities. For example, genes that 

encode proteins which mediate the combinatorial process that combines a fiiiife 

number of component genes into the very broad range of antigen-spedfic 

immunoglobulin and T-ceU binding protems, are expressed at higher levels in patients 

with systemic lupus erythematosis (SLE) flian in healthy subjects (Girschick et al., 

2002), 

[0178] hnmuneceU-related sequences can possess or interact with a CUB 
domain, which is an extraceUular domain of qiproximately 110 amino acids, and is 
present m functionally diverse, including developmentally regulated, proteins 
(ht(pV/ptowusfl.edu/cgi-bin/getdesc?nameFCUB). Immune cell-related sequences 
can also possess or mteract with a CD-20 dornain, which has four transmembrane 
regions, both extracellular and cytoplasmic extensions, and is found,.inter alia, in a 
hi^ affinity IgE recqptor (hftp://pfam.wusfl,edu/cgj* 

Immune cell-related sequences can also possess or interact with an interferon-induced 
transmembrane protein (CD225) domain, which is found in a family of proteins that 
includes the human leukocyte antigen CD225, an interferon-inducible transmembrane 
protein associated with interferon-induced cell growth suppression Oittp://pfem.wusti. 
edu/cgi-bin/getdesc?name=CD225). Immune cell-related sequences can also possess 
or interact with sushi domains, also known as complanent control protein (CCP) 
modules, or short consensus repeats (SCR). These domains are found in a wide 
variety of complCTient and adhesion proteins, including protems responsible for the 
antigenicity of blood group antigCTS on the external fece of the red blood cell 
membrane (http://pfam.wusti.edu/cgi-bin/getdesc?name=^^ Immune ceU-related 
sequences can also possess or interact with SH2 domains and rvt domains; both are 

described above. 

Integrase-Related Sequences 

[0179] Integrases are enzymes that form proviruses by inserting a linear 
double-stranded DNA copy of a retroviral genome into host cell DNA. Examples of 
integrases include HIV integrase, PhiC3 1 integrase, and Sip. 

[01 80] Integrase-related sequences can possess or interact with an 
integrase zinc binding domain (Integrase_Zn) domain, which is a zinc binding protein 
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domain placed near the K-terminus (jtttp'7/pfam.wustl,edu/cgi-bii3/getdesc? 
name=Integrase__Zn). Integrase-related sequences can also possess or interact with an 
integrase core (rve) domain, which is a protein domain that fomis the central catalytic 
core of the integrase (http://pfam.wustl.edii/ cgi-bin/getdesc?nanie=Tve). This domain 
acts as an endonuclease to cleave the nucleotide and catalyzes the transfer of the viral 
DNA strand to the integration site of the host DNA. Integrase-related sequences also 
possess or int^:act with an integrase DNA bmding (integrase) domain, which is a 
DNA-binding protein domain near the C-tenninus (http:/4>fam.wustl,edu/cgi- 
bin/getdesc?naiiie=integrase). Integrase-related sequences also possess or interact 
reverse transcriptase (rvt) domains, which are described above. Integrase-related 
sequences also possess or interact with a BNase H domain, which is a protein domain 
that hydroiyzes the KNA portion of RNTA/DNA hybrids (http^/pfem. wujsfl.edu/cgi- 
bin/getdesc?nam6=maseH). 

Integrin-Related Sequences 

[0181] Integrins are transmembrane proteins that mediate cell to cell as 
weU as ceU to inatrix adhesion, and provide a means of communication between the 
interior of a cell and the extracellular matrix. The extracellular portion of integrins 
binds to components of the extracellular matrix, e.g., collagen, fibionectin and 
laminin. The intracellular portion of integrins interacts wifli the cell cytoskeleton, 
e.g., actin filaments riear the cell sur&ce. Integrins transmit information about the 
extracellular environment across the plasma membrane to the cytoskeleton, where it is 
available to intracellular signaling mechanisms (Alberts et al., 1994). Structurally, 
integrins consist of heterodiiners of an alpha and a beta subunit Each subunit has a 
large N-terminal extracellular domain followed by a transmembrane domain and a 
short C-terminal cytoplasmic region. The pairing of certain alpha subunits witii 
certain beta-subunits determines ligand specificity, localization and Amotion. The 
extracellular binding domains of integrins often bind their ligands with low affinity; 
simultaneous, weak, binding with multiple matrix molecules provides the cell with a 
means to sense its complex, changing, extracellular enviromnent without becoming 
glued to it Examples of integrin-related sequences include integrin alpha and beta 
subunits, collagens, and integrin-linked kinase (Zhang et al., 2002), 

[0 1 82] Integrin-related sequences can possess or interact with von 

Willebrand factor type A (vwa) domains, which are protein domains that participate 
in diverse biological fimctions, e.g., cell adhesion, migration, homing, pattern 
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formation, and signal transduction (http://pfani,wustl. edu/cgi-bin/getdesc? 
nanie=vwa). Integrin-related sequences can also possess or interact with FG-GAP 
repeat (FG-GAP) domains, which are protein domains present in the vicinity of ligand 
binding domains at the N-terminus of integrin alpha subunits (http://pfam.wiistl.edu/ 
cgi-bin/getdesc?name=FG-GAP), 

Interacting Protein-Related Sequences 

[0183] All "interacting protein" is a protein that interacts with another 
molecule. Interacting proteim are involved in eveiya^ect of ceUi^^ 
Interacting proteins have been characterized in all known locations in tiie cell, and 
include all, or most types of, proteins. Interacting proteins in the nucleus regulate 
such diverse functions as apqptosis, transcription, homologous recombination, and 
DNA repair. Nuclear fibroblast growth fectorr2 interacting fector interacts with 
fibroblast growth fector 2 to prevmt apoptosis (Van den Berghe et al., 2000). Gr^2 
cyclin-D interacting protein (GCIP) a nuclear cell-cycle protein, inhibits select 
transcriptional events, and reduces the leve 1 of phosphorylation of nuclear 
retinoblastoma protein (Chang etal., 2000). Pir51,ahumanhomologueofRecA,a 
bacterial enzyme that mediates genetic recombination, interacts with the enzyme 
- radS 1 to regulate homologous recombination and DNA repair in mammalian cells 
(kovalenko et al., 1997), Hqpatitis B virus X-associated protein (HBXAP), a protein 
demonstrated to play a role in tiie devel<q)ment of hepatocelluar carcinoma, interacts 
with the hepatitis B virus regulatory gene product HBx to mcrease viral traniscription 
(Shamay etal., 2002). 

[0184] Interacting protein-related proteins can utilize many protein domain 
motife for interactionu They can possess or interact with domains that mediate 
interaction witii DNA, KNA, ions, or other proteins. For example, PDZ domiains, 
which are also known as DHR or GLGF domains, target signaling molecules to 
membranes and mediate flie assembly of functional membrane domaim (Fanning and 
Anderson, 1999). Interacting protein-related proteins can also possess or interact wifli 
rrm domains, which are described above. 
Isomerase-Related Sequences 

[0185] Isomerases are enzymes that convert molecules into their 
positional isomers, i.e., into molecules witii the same chemical formula but a different 
stereochemical arrangement of atoms; Isomerases act on a wide variety of molecules, 
including sugars, amino acids, and nucleic acids. They are involved in a wide range 
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of physiological and pathological functions, including those involving metabolic and 
synthetic pathways. 

[0186] Isomerase-related sequences include isomerase genes and gene 

products, their substrates, products, activators, inhibitors, effectors, and cofectors, 
regulatory molecules that modulate their function, genes and gene products affected in 
disordeis associated with isomerases and antibodies that specifically recognize or 
modulate isomerase-related sequences. Examples of isomerase-related sequences 
include triosephosphate isomerases, peptidyl-prolyl isomerases, glucose phosphate 
isomerasesi disulfide isomerases, ketosteroid isomerases, and ribosyltransferase- 
isomerases OBrown et al., 1985). 

[0187] Isomerase-related sequences can possess or interact with 

triosephosphate isomerase (TIM) domains, which are protein domains that catalyze 
the reversible interconversion of glyceraldehyde 3-phosphate and dfliydroxyacetone 
phosphate (ht^://pfam.wustl.edu/cgi-bin/getdesc?nameF=TI^ Isomerase-related 
sequences can also possess or interact with cyclophilin type p€ptidyl-pn)l^^ 
isomerase ^ro Jsomerase) domains, which accelerate protein folding by catalyzing 
the cis-trans isomerization of peptide bonds (http://pfam.wustl.edu/ 
cgibin/getdesc?name=T)ro_ isomerase). 

Mudn-Related Sequences 

[0188] The temi mucin refers to both an albumin-like substance that is 

presmt in mucus, and to transmembrane proteins that can typically be produced in 
botiht soluble and transmembrane forms. Soluble mucios comprise mucus gels that 
protect epithelial cells in the airways, digestive tract, and o&er organs, and are found 
in body fluids, such as milk, tears, and saliva. In their transmembrane forms, mucins 
provide a steric barrier to protect the apical surface of cjpitheUal cells. 
Transmembrane mucms are also involved in pathogenesis; for example, they mediate 
viral entry into cells, promulgate the inflammatory response, and are involved in the 
regulation of abnormal cell proliferation (Jefifery and Zhu, 2002; Tsuda et al., 1993). 
Examples of mucins include MUC2 mucin, mucin carcmoembryonic antigen, and 
Muc3 membrane bound intestinal mucin. 

[0189] Mucin -related sequences can possess or interact with mucin-like 
glycoprotein (tryp^mucin) domains, which are domains ttiat are involved in ttie 
interaction of parasites with host cells (http://pfam.wusfl.edu/cgi- 
bin/getdesc?name=Tryp_mucin). Mucin-related sequences can also possess or 
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interact with multi-glycosylated core protein (MGC-24) domains, which are protein 
domains of sialomucins that are expressed in many normal and cancerous tissues 
(htlp://pfam.wustl.edii/cgi-bin/getdesc?name=MGC-24), 
Other Polypeptide-Related Sequences 

[0190] In addition to the sequences described above, the sequences of 

the invention include nucleotide and amino acid sequences, some with known 
function, and some with unknown function, that fiall into a broad array of categories. 
These sequences are listed below in SEQ ID NOS.: 210 - 418, as "Other Polypeptides 
with Known Function," and "Other Polypeptides," respectively. 

[0191] Polypeptide-related sequences of the invention can possess or 
interact with groucho/TLE N-teiminal Q-rich (TLE_N) domains, which are protein 
domains found in co-tepressor proteins^ and are involved in .oligomerization 
(ht^://pfam.wustl.edu/cgi-bin/getdesc?name=TI£_N)^ Polypeptide-related 
sequences of the invention can also possess or interact with unchaiacterized protein 
family 0160 (UPF0160) domains, which are protein domains found in proteins that 
include multiple metal-binding residues, and in some cases act ais a phosphodiesterase 
(htlp://pfiim.wustl.edu/cgi-bin/getdesc?name=UPF0160). Polypeptide-related 
..sequences^pf the^in^^ oan also possess or interact with SNF7 domains, which are 
protein domains involved in.protein sorting and transport fix>m the ehdosome to the 
lysosome or vacuole of eucaryotic cells (http://pfam.wustl.edu/cgi-bin/getdesc? 
name=SNF7). Polypeptide-related sequences of the invention can also possess or 
interact with NifLT-like N-tenninal (NifUJN) domains, which are protein domains 
involved in nitrogen fixatioi^ and other functions (http://p£un.wustl.edu/cgi- 
bin/getdesc? name=NifU_N). Polypeptide-related sequences of the invention can also 
possess or interact with tRNA synthetases class II (D, N) (tRNA-synt_2) 
domains, which are protein domains that activate the amino acids asparagines, 
aspartic add, and Ijrsine, and transfer them to specific tRNA molecules 
(http://pfiun.viaistl.edu/cgi-bin/getdesc?name=tRNA-s^^ 

[0192] Polypeptide-related sequences of &e invention can also possess 
or interact with dynein heavy chain (dynein_heavy) domains, which are protein 
domains that correspond to the C-terminal region of the dynein heavy chain 
(http://pfam.wustl.edu/cgi-bin/getdesc?name^Dyneift^^ Polypeptide-related 
sequences of the invention can also possess or interact with cyclin-dependent kinase 
regulatory subunit (CKS) domains, which are protein doniains of approximately 79- 
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1 50 amino acid residues that are involved in regulating progression througji the cell 
cycle Oit^:/>^fein,wustl,edu/cgi-bin/getdesc?name= CKS). 

[0 1 93] Polypeptide-related sequences of the invention can also possess 
or interact with nucleoside diphosphate linked to some other moiety X (NUDDQ 
domains, which are protein domains that are involved in removing oxidatively 
damaged nucleotides (http://pfam.wustlxdu/cgi-bin/getdesc?hame=>n^ 
Polypeptide-related sequences of the invention can also possess or interact with T- 
complex protein/cpn60 chaperonin (cpn60_TCPl) domains, which are protein 
domains involved in protein folding and oligomerization (http://pfanLWUStl.edu/cgi- 
bin/getdesc?name=<5)n66_TCPl). Polypeptide-related sequences of the invration can 
also possess or intact with F-actin capping protein, beta subunit (F_actin_c^_B) 
domains, which are protein domains of ^proximately 280 amino acids that are 
involved in capping actm, i.e., blocking the exchange of actim monomers (hitp://p&m. 
wustl.edu/cgi"bin/getdesc?name=F_actm_capJB). 

[0194] Polypeptide-related sequences of the invention can also possess 
or interact with G-protein alpha subunit (G-alpha) doniaios, wM^ 
domains that bind guanyl nucleotides, and function as a GTPase (http://pfianLwustlJ 
edu/cgi-bin/getdesc? name=!<j-alpha). Polypeptide-related sequences of the invention 
can also possess or interact with Kn?)pel-associated box (KRAB) domains, which are 
protein domains involved in protein-protein interactions, and present in some zinc 
finger proteins (Jit^://pfem.wustl.edu/ cgi-bin/getdesc?name=KRAB). Polypeptide- 
related sequences of the invention can also possess or interact with metallopeptidase 
family M24 (Peptidase_M24) domains, which are protein domains that are found in 
some metalloproteases, including proline dipeptidase, and methionine aminopeptidase 
(httpV/pfiuiLwustLedu/cgi-bin/getdesc?name=Pep Polypeptide-related 
sequences of the invention can also possess or interact with thioredoxin (thiored) 
domains, which are protein domains involved in oxidation^peduction reactions by 
reversibly oxidizing disulfide bonds (http://pfam.wustl.edu/cgi-bin/getdesc? 
nam^=thiored). 

[01 95] Polypeptide-related sequences of the invention can also possess 
or interact vidfh TUDOR domaiiis, which are protein domains involved in the 
formation of primordial germ cells, and for normal abdominal segmentation 
(http;//pfanLwustl.edu/cgi-bin/getdesc?name=TUDOR). Polypeptide-related 
sequences of the invention can also possess or interact with SIT4 phosphatase- . 
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associated protein (SAPS) domains, which are protein domains that are involved in 
cyclin transcription (http://pfam.wustl,edu/cgi-bin/getdesc?name==SAPS). 
Polypeptide-related sequences of the invention can also possess or interact with 
ankyrin repeat (ank) domains, which are protein domains of approximately 33 amino 
acids, and are sometimes found in tandemly repeated modules (http://pfem.wustl.edu/ 
cgi-bin/getdesc? name=ank), Polypeptide-related sequences of the invention can also 
possess or interact with nicotinamide N-methyltransferase/phenylethanolamine N- 
melhyltransferase/ thioefhCT S-mefhyltransferase (NhIMT_P>MT_TEMT) domains, 
which are protein domains that are found in proteins that use S-adenosyl-L- 
methiwune as the methyl donor (http://pfem.wustl.edu/cgi-bin/getdesc?name= 
NNNnjPNMT_TEMT). Polypeptide-ielated sequences of the invention can also 
possess or interact witti Clq domains, which are protein domains involved in 
activating the serum complement system (http://pfem.wustl.edu/cgi-bin/getdesc? 
nam€?<;iq). Polypeptide-related sequences of the invention can also possess or 
interact with collagen triple helix repeat (Collagen) domains, which are protein 
domains that typically form extracellular connective tissue (http:/^fem.wustl.edu/cgi- 
bin/getdesc? name=Collagen). 

[0196] Polypeptide-related sequences of the invention can also possess 
or interact wifli the hyaluronan/mKNA binding femily (HABP4_PAI-RBP1) domain, 
which is a protein domain that can bind to flie ^ucosaminoglycan hyaluronan, and to 
RNA (http://pfam.wusfl.edu/c^-biii/getdesc?name=HABP4_PAI-RBPl^ 
Polypeptide-related sequences of the mvention can also possess or interact with 
eucaryotic aspartyl protease (asp) domains, which are protein domains that cleave 
peptide bonds; proteins with this domain include pepsins, cathepsins, and remiin 
0ittp://pfam.wustLedu/cgi-bin/getdesc?name=asp). Polypeptide-related sequences of 
the invention can also possess or interact witii trypsin domains, which are protein 
domains that function as serine proteases (htlp://pfein.wusti.edu/ cgi-bin/getdesc? 
name=trypsin). Polypeptide-related sequences of the invention can also possess or 
interact with Kunitz/Bovine pancreatic trypsin inhibitor (Kunitz_BPTI) domains, 
which are protem domains that is found in serine protease inhibitors (ht^://pfam. 
wustl,edu/cgi-bin/getdesc?name=Kunitz^BPTI). Polypeptide-related sequences of flie 
invention can also possess or interact with proliferating cell nuclear antigen, N- 
temiinal (PCNA) domains, which are protein domains that are found on non-histone 
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acidic nuclear proteins, and play a role in controlling DNA replication Oittp://pfam. 
wustl.edu/cgirbin/getdesc?name=PCNA). 
Oxygenase-Related Sequences 

[0 1 97] Oxygenases are enzymes that catalyze the incorporation of 

molecular oxygen into organic substances. Dioxygenases, also known as oxygen 
transferases, catalyze the introduction of bofli atoms of molecular oxygen, and 
typically contain iroa Monooxygenases, ialso known as mixed function oxygraases, 
introduce one oxygen atom; the oflier is reduced to water. Examples of ojqfgenase- 
related sequences include cytochrome oxygenases, heme oxygenases, 
cyclooxygenases, lipoxygenases, andpeptide-^ispartate beta-dioxygenase. 

[0198] Oxygenase-related sequences can possess or interact with alkyl 

hydroperoxide reductase/thiol specific antioxidant (AlqC-TS A) domains, which are 
responsible for providing a defense against sulfur-contaiiung radicals; proteins that 
possess tiiis domain include alliens, e.g., asp f 3, mal f 2, and mal f 3 
(http://pfem-wustl.edu/cgi-bin/getdesc?nameF:AhpC-TSA). Oxygenase-related 
sequences can also possess or interact with monooxygenase domains, which are 
protein domains that utilize flavin adenine dinucleotide (FAD) (http*7/pfam.wustl. 
edu/cgi-bin/getdesa?name=Monooxygenase)-Qxygenase-rela^ can alsk> 

possess or interact with dioxygenase domains, ^ch are protein domains that 
catalyze the incorporation of bofli atoms of molecular oxygen into substrates 
(hti^://pfam.wustl.edu/cgi-bin/getdesc?name= Dioxygenase). 

Peroxidase-Related Sequences 

[01 99] Peroxidases are enssymes that catalyze the reduction of 

hydrogen peroxide. Peroxidases are generally located within peroxisomes, which are 
intracellular organelles that metabolize fatty adds and toxic compounds. Disorders 
associated with peroxidase-related sequences include X-linked adrenoleukodystrophy. 
Examples of peroxidase-related sequenceis include glutathione peroxidases, thiol 
peroxidases, catalases, horseradish peroxidases, anionic peroxidases, and thyroid 
peroxidases. 

[0200] Peroxidase-related sequences can possess or interact with alkyl 
hydroperoxide reductase/thiol specific antioxidant (Alq)C'-TS A) domains, which are . 
protein domains that can reduce organic hydroperoxides (htlp.7/pfam.wustl.edu/cgi- 
bin/getdesc? name=AhpC-TSA). 
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Phospholipase-Related Sequences 

[0201], Phospholipases are enzymes that act on phospholipids- Hiey 
characteristically generate products that are active in signal transduction pathways. 
For example, phospholipase C hydrolyzes phosphatidylinositol bisphosphate (PIP2) to 
generate flie two intracellular mediators, inositol trisphosphate (IP3) and 
diacylglycerol. IP3 releases Ca^^ from stores in the endoplasmic reticulum, increasing 
the cytosolic Ca^^ concentration. Diacylglycerol remains in the plasma membrane 
and activates protein kinase C. 

[0202] Phospholipase activity is mvolved in the synthesis of eicosanoids, 
inflammatory mediators that include prostaglandins, prostacyclins, flaromboxanes, and 
leukotrienes. Corticosteroid hojmones, such as oOTtisone, for example, inhibit 
phospholipase activity in the first step of the dcosanoid synthesis pathway. 
Corticosteroid hormones are widely used clinically to treat noninfectious 
inflammatory diseases, such as some forms of arthritis (Ribardo et al., 2002). 

[0203] Phospholipids play a pivotal role in the modulation of intestinal 
inflammation. The mucosal surfece of the digestive tract functions as a regulatory 
barrier between the gastromtestinal lumen and flie underlying mucosal immune 
system. PhosphoUpidshelp.preseryefhemucpMfo^ 

physiological damage to the lumen, thus preventing invasion of harmful luminal 
fectors into the host, which subsequentty may lead to inflammation, or a pafliological 
unmune response, bofli promoting and inhibiting gastrointestinal inflammation and 
immunity (Sturm and Dignass, 2002). 

[0204] Phospholipase-related sequences can possess or mteract with 
lysophospholipase catalytic (PLA2_B) domains, whidi catalyze the release of fetty 
cids from lysophosphoUpids Oittp://pfam-wustt,edu/cgi-bin/geW^ 
Phospholipase-related sequences can also possess or interact with 
phosphoUpase/caiboxylesterase (abhydrolase_2) domams, which have broad substrate 
specificity (http://pfem,wustl.edu/cgi-bin/getdesc7name===abhyd^ 
PhosphoUipase-related sequences cm also possess or interact w 
lipase/acylhydrolase (Lipase_GDSL) domains, which are present in lipolytic enzymes 
wifli serine in the active site (http://pfem.wusfl.edu/cgi-bin/getdesc?name= 
Lipase^GDSL). 
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Prosaposin-Related Sequences 

[0205] Saposins are small lysosomal proteins fliat activate lysosomal 

lipid-degrading enzymes, including enzymes that metabolize sphingosine. They 
typically isolate lipids from their membrane surrotmdings, and increase their 
accessibility to degradative enzymes. Mammalian saposins are synthesized as a 
single precursor molecule, prosaposin, which becomes an active saposin following 
proteolytic activation. Examples of prosaposin-related sequences include saposin A, 
s^osin B, and saposin C. Disorders associated with pros^K>sin-related sequences 
include neurodegenerative diseases similar to similar to Tay-Sachs and Sandhoff 
diseases, e.g., Gaucher'is disease, which is described above. 

[0206] Prosaposin-related sequences can possess or interact with 

saposin-A (SAPA) domains, saposin Bl (SapB_l) domains, and S£qx)sin B2 (S^B_2) 
domains, which are described above. 

Proteasome-Related Sequences 

[0207] Proteasonies are intcaceUular complexes fliat degrade proteins. 
Proteasomes recognize proteins that have been marked for destruction by Ifae addition 
of an ubiquitin molecule, unfold these ubiquitinated proteins, cleave ffaiem into small 
peptides of 6-12 andno acids, and release them into the cytosol (Mitch and Goldberg, 
1996). Examples of proteasome-related sequences include 26S proteasome subunits, 
268 proteasome regulatory chains, and ubiquitixL 

[0208] Proteasome-related sequences can possess or interact with 

proteasome/cyclosome repeat (PCjep) domains, which are protein domains that are 
present in regulatory subunits of the proteasome (ht^://p&m.wustl.6du/cgi- 
bin/getdesc?name= PC jrep). Proteasome-related sequences can also possess or 
interact with Mov34/MPN/PAD-l family (Mov34) domains, which are protein 
domains found at the N-terminus of regulatoiy subunits of the proteasome 
(http://pfam.wustl.edu/cgi-bin/getdesc?name=Mov34). 

Reductase-Related Sequences 

[0209] Reductases are enzymes that catalyze reduction reactions, i.e., 

reactions in which hydrogen is combined with a molecule, or reactions in which 
oxygen is r^oved from a molecule. Examples of reductases include dehydrogenase 
reductases, oxidoreductases, quinone reductases, CoA reductases, dihydrofolate 
reductases, tetrahydrofolate reductases, carbonyl reductases, iiitrate reductases. 
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epoxide reductases, NADP(+) reductases, ribonucleotide reductases, and thioredoxin 
reductases (Loeflfen et al., 1998). 

[021 0] Reductase-related sequences can possess or interact wifli short 
chain dehydrogenase (adh__short) domains, which are present in a wide variety of 
proteins (http://pfam.wustl.edu/cgi-bin/getdesc?name?=^dh__short). Reductase-related 
sequCTLces can possess or interact with NADH-Ubiquinone oxidoreductase (conq)lex 
I), chain 5 N-tenninus (oxidored_ql_N) domains, which are protein domains that 
catalyze the Iransfer of electrons from NADH to ubiquinone in a reaction that can be 
associated with proton translocation across a membrane (http://pfam.wustl.edu/cgi- 
bini/getdesc?name=^xidored_ql_N). 

Reverse Transcriptase-Related Sequences 

[021 1] Reverse transcriptases are enzymes that make double stranded 
DNA copies from single stranded nucleic add template molecules. T^ically, a 
reverse transcriptase is a DNA polymerase that can copy both RNA and DNA 
templates, and has an integral RNase H activity (Lim et al., 2002). The two 
en2^miatic domains of reverse transcriptase reflect these two activities; the first is a 
DNA polymen^e domain that can use either RN A or DNA as a template to synthesize 
either the minus-strand or the plus strandof DNA, and the second is an RNase H 
domain that degrades flie RNA in RNA-DNA hybrids (CoflBn, 1997; Wu and Gallo, 
1975). 

[0212] Reverse transcriptase plays a role in the replication of some 
viruses, e.g., retroviruses. It copies the retroviral RNA genome to produce a single 
miniis strand of DNA, then catalyzes the synthesis of a complementary plus strand. 
Accordingly, reverse transcriptase is a therapeutic target for conditions that involve 
retroviruses, e.g., Aquired Immime Deficiency Syndrome (AIDS). A number of anti- 
retioviral drugs inhibit revjcrse transcriptase (Frank, 2002). 

[0213] Reverse transcriptase iis also a standard scientific research tool in 
the field of molecular biology. The reverse transcriptase polymerase chain reaction 
(RTPCR) amplifies specific DNA sequences rapidly, and in vitro, RTPCR can detect 
trace amounts of RNA and DNA, and is used in a wide range of appUcations, 
including forensics, the diagnosis of genetic diseases, determination of the prognosis 
of diagnosed diseases, and the detection of viral infection (Alberts, et al., 1994). For 
example, reverse transcriptase is used to diagnose cancer (Rowland, 2002), and to 
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provide prognostic mfonnation about the predicted survival of patients with prostate 
cancer (Kantoifet al., 2001). 

[0214] An example of a reverse transcriptase is telomerase, a general 
tumor marker with a reverse transcriptase catalytic subunit (Kiilq)atrick and Mokbel, 
2001)- Most human somatic cells do not express the telomerase reverse transcriptase 
gene; conversely, most cancer cells express this gene (Ducrest et al., 2002; Kyo et al., 
2000 The human telomerase reverse transcriptase promoter has been placed in gene 
therapy vectors that specifically target telomCTase-positive tumor cells, and spare 
nearby telomerase-negative cells (Pan and Koeneman, 1999). Human telomerase 
reverse transcriptase is also recognized as a tumor antigen that can b e a target for 
immunotherapeutic approaches to cancer (Gordan and Vondetheide, 2002). 

[0215] Reverse transcriptase-related sequences can possess or interact 
with rvt, tiansposase_22, WD40, and Exojendo_j)hos domains, all of which are 
described above. 

Ribosome-Related Sequences 

[0216] A ribosome is a particle comprised ofiibosomal proteins and 
ribosomalRNA that catalyzes protein synthesis ficom messenger RNA. Ribosomes 
are composed of two subunite, the large (L) subunit and the small (S) subunit. The 
typical TnatntTifllifln ribosome comprises four RNA molecules and approximately 
eigjity different proteins, which are higjily conserved among prokaryotes and 
eukaryotes, and perform a variety of tasks related to protein synthesis • e,g., 
coordinating protein synttiesis in a manner that maintains cell homeostasis 
(Yoshihama et al., 2002; Kenmochi et al., 1998). 

[0217] Ribosomal proteins can perform functions independent of then: 
involvement in protein synthesis. For exanaq[>le, they are involved m cell-cycle 
progression, e.g., as cell cycle checlqjoints, and mediators of homologous 
recombination, embiyogenesis, and skeletal development (Yoshihama et al., 2002; 
Chen and loannou, 1999). They also contribute to the regulation of cell growth, 
transformation, and deafli, and can induce apoptosis (Cbsa and loamiou, 1999; Naora 
et al., 1999). Mutations in ribosomal proteins are associated wiA human diseases, 
including Down syndrome, Diamond-Blackfan anemia, Turner syndrome, and 
Noonan syndrome (Y oshihama et al., 2002). 

[021 8] Ribosomal proteins have been grouped into protein families on the 
basis of sequence similarities in functional domains. One family of ribosoinal 
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proteins, the ribosomal protein LI 1 , RNA binding (Ribosomal^Ll 1) domain, is 
comprised of members that possess the LI 1 RNA binding domaiiy this femily 
includes the ribosomal proteins LI 1 and L12, which are components of Ihe laige 
subunit LI 1 is a protein of 140 to 165 amino-acids that bmds to a 23S RNA 
molecule, the C-teiminal region of which is buried within the ribosomal structure 
(http-7/pfem,wustl.edu/cgi-bin/getdesc?name=Ribosomal_Ll 1). Another family of 
large ribosomal subunit proteins possess the ribosomal protein L13e 
(Ribosomal_L13e) domain, which is found in a wide range of vertebrates and in 
lower-order species (http://^fiam-wustl,edu/cgi-bin/getdesc?name=RibosomalJ^ 
as is the ribosomal protein L44 (Ribosomal_L44) domain ^:/^fanLwustl.edu/cgi- 
bin/getdesc?name= Ribosomal_L44). 

[0219] Additional ribosomal protein families enconq)ass small subunit 
proteins. The ribosomal protein S6e (RibosomaLS6e) domain is present in a family 
of proteins which includes protein kinase substrates thatfcontrol cell growfli and 
proliferation by selectively translating particular classes of mRNA 
(http:/^fam*wisfl.edu/cgi-bin/getdesc?n)mie=I^ The ribosomal 

protein S8e (Ribosomal_S8e) domain is present in a femily of proteins comprising 
approximately 220 amino acids in eukaiyotes, iand about 125 amino adds in 
archebacteria (htip://pfam.wusti.edu/cgi-bin/getdesc?name=Ribosornal_S8e). The 
ribosomal protein S10p/S20e (Ribosomal_S10) domain is preseait in a femily of 
proteins which includes the small ribosomal subunit SIO &om prokaryotes and S20 
fiomeukaryotes(http:/4)fam.wustt.edu/cgi-bin/getdesc?name=Ribosom^^ SIO 
is involved in binding transfer RNA to the ribosome, and also operates as a 
transcriptional elongation fiEictor. 

RNase-Related Sequences 

[0220] RNases are en2ymes that clieave RNA. RNases generally 

recognize tiieir targets by tertiary structure, rattier than by sequence; they include 
exonucleases, which remove the terminal base in an RNA sequence, and 
endonucleases, which can cleave non-terminal bases. Examples of RNases include 
RNase E, which is involved in the formation of 58 ribosomal RNA torn pre- 
ribosomal RNA; RNase F, which cleaves both viral and host RNA in response to 
interferons, inhibiting protein synthesis; RNase H, which is specific for ttie RNA 
strand of an RNA-DNA hybridj RNase P, which generates transfo: RNA fix>m 
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precursor transcripts; and RNase T» which removes the teiminal AMP firom 
nonaminoacylated tRNA (CofiBn, et al., 1997), 

[0221] RNase-related sequences can possess or interact with rvt,rve, 
RNase H, and gag_p30 domains, aD of which are described above. 

RNase H-Related Seqaences 

[0222] RNase H is a nuclease specific for the RNA strand of an RNA- 
DNA hybrid that cleaves phosphodiester bonds to produce molecules with 3 '-OH and 
5 -PO4 ends. Multiple forms of RNase H are present in both prokaryotes and 
eukaiyotes. RNase H maiy be part of larger polypeptides and its activity can be 
iofiuenced by other regions of these polypeptides (Coffin, et al., 1997; Crouch 1990). 

[0223] During retroviral replication, RNase H activity forms 
oligonucleotides that prime DNA synthesis. Therefore, the RNase H activity of 
reverse transcriptase is a target for therapeutic intervention. For example, small 
molecule inhibitors of retroviral RNase H Amotion have shown promise in managing 
mv infection (Klaiman, et al., 2002). 

[0224] Another therapeutic incUcation for RNase His the regulation of 
cancer genes by targeting niRNA translation. Antisense deoxyoligonucleotides down- 
regulate mKNA e^ipression by annealing to specific regions of an mRNA. Formation 
oftheDNA:RNAheteroduplex then triggers inRNA cleavage by RNase H. Cleavage 
is r^idly followed by fiuiher degredation, irreversibly preventing translation of the 
target mRNA. Antisense deoxyoligonucleotides that trigger RNase H activity can 
thus be used as cancer therapeutic agents (Crooke, 1996; Curcio et al., 1997). . 

[0225] KNaseH-ielatedsequmces can possess or interact with rnaseH, 
Gagj)30, rvt, and rve domains, all of whi<^ are described above. 
. SH3-*Related Sequences 

[0226] Src homology region 3 (SH3) is a polypeptide domain commonly . 
found in intracellular signaling proteins; it binds with mod^ate affinity and selectivity 
to pmline-rich ligands. SH3 domams are heterogeneous; different SH3 domains bind 
to different proliii&-rich sequences (Gmeiner and Horita, 2001). SH3 domains are 
involved in a wide variety of biological processes, including mediating the assCTibly 
of large multiprotein complexes, regulating enzyme activity, and modulating the local 
concentration or subcellular localization of signaling pathway components (Mayer, 
2001). Examples of SH3-related sequences include phosphotyrosine receptors, 
membrane associated guanylate kinases, mitogen-activated protein kinases, myosin 1, 
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the Qk adaptor protein, phospholipase C-y, Gib2, Sos, src-SH3, Abl-SH3, the Nek 
adaptor, and alpha-spectrin-SHS. 

[0227] SH3~related sequences can possess or interact with SH3 

domains, which are protein domains of approximately 50-70 amino acids, and are 
present in a large number of proteins involved in intracellular signaling (http://pfam. 
wustl,edu/cgi-bm/getdesc?name=SH3). SH3-related sequences can also possess or 
interact with SH3 domain-binding protein 5 (SH3BP5) domains, which are protein 
domains that act as a substrate for c-Jim N-terminal kinase (http://p&m.wustl.edu/cgi- 
bin/getdesc7name=SH3BP5). 

Stem Cell-Related Sequences 

[0228] Stem cells are pluripotent or multipotent cells that generate maturing 
cells in multiple differentiation lineages. Pluripotent cells have the capacity to 
differentiate into each and every cell present in the organism. Embryonic stem cells 
are pluripotent; diey can differentiate into any of the cells present in the adult 
Multipotent cells have the ability to differentiate into more than one cell type. Organ- 
specific stem cells are multipotent they can differentiate into any of the cells of the 
organ they inhabit 

[0229] When they divide in vivo, both pluripotent and multipotent stem cells . 
can maintain their pluripotency or multipotency while giving rise to differentiated 
progeny. Thus, stem cells can produce replicas of themselves which are pluri- or 
multipotent, and are also able to differentiate into lineage-restricted committed 
progenitor cells. For exaiiq)le, hematopoeitic stem cells, which are multipotent cells 
specifically able to form blood cells, can divide to produce replicate hematopoeitic 
stem cells. They can also divide to produce more hig^y differentiated cells, which 
are precursors of blood cells. The precursors differentiate, sometimes tfarougjh several 
generations of cells, into blood cells. A hmiatopoetic stem cell can also divide into a 
cell with the capacity to form, for exan^le, a relatively undifferentiated cell that is 
committed to differentiate into, i.e., granulocytes, or erythrocytes, or another type of 
blood cell. 

[0230] Stem cells can also reproduce and differentiate in vitro. Embryonic 
stem cells have been directed to differentiate into cardiac muscle cells in vitro and, 
alternatively, into early progenitors of neural stem cells, and then into mahire neurons, 
and glial cells in vitro (Trounson, 20.02), 
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[0231] StmceUlher^y is effective in treating cancCT inhume . 
al., 2001), and offers several advantages over traditional cancer Iher^ies (Weissman, 
2000). One advantage of stran cell ther^y exists when used in conjunction with 
radiation ther^y. Inradiationtherapy for canca.flie dose of radiation necessary to. 
kill the cancer cells in an organ can also be sufficient to destroy the healthy cells of 
the organ. In combined stem ceU and radiation aerq)y, an organ is first treated ^wlh 
sufficient radiation to destroy all of the cancer ceUs and most or all of fee healthy 
cells, but then stem cells are infused to repopulate the organ. In the ensuing weeks, as 
the cancer cells and healthy cells die, the stem cells replace the healthy cells. Another 
advantage of this q)proach, compared to heterologous organ transplants, is that there 
is no risk of rejection, since stem cells do not provoke ah immune response. A fiuiher 
advantage is that stem cells are mherently programmed to regulate their numbers and 
differentiation status, i.e., once provided to the patient, the necessary number will 
differentiatej and the rest will remain undifferentiated (Weissman, 2000). 

[0232] Stem ceU therapy is also effective in treating autoimmune disease in 
humans. For example, immunosvqrpression m conjunction with stem-cell 
transplantation has induced remission in patients with refractory, severe rheumatic 
autoumnune disease (Van Laar and Tyndall, 2003). Patients with rheumatoid 
arthritis; systemic l«?)us caylhematosus, systraaic sclerosis, and juvenile idiopathic 
arthritis have benefited fiom stem ceU tiansplants (Van Laar and Tyndall, 2003). 

[0233] Preclmical studies also suggest the potraitial of stem cell 
transplantation for the tireatment of neural and muscular injuries and disorders, 
includmg those of the central nervous system, peripheral nervous system, and skeletal, 
cardiac and smooth muscle (Deasy and Huaid, 2002). Stem ceUs transplanted into the 
bone marrow of mice migrate to the site of ngured muscle and differentiate mto new 
muscle cells. For exonple, patients with myasthenia gravis, muscular dystrophies, 
amyotrophic lateral sclerosis, congestive heart Mure, Parkinson^ disease, and 
Alzheimer^ disease may benefit fixjm stem cell therapy (Henningson, 2003). 

[0234] In addition to th«q)eutic uses, research using stem cells can provide 
usefij iriformation about nwmal stem ceU fimction and the pathogenesis of disease. 
Stem ceUs derived from a patient with a genetic disease can provide a tool iFor 
studying that disease. To derive these stem cells, a somatic cell, i.e., a ceU that is not 
m the oocyte or spermato<qrte lineage, is donated by the patient, and the nucleus is 
removed and transferred to an unfertilized human oocyte. This nuclear transplant 
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procedure produces, at flie blastocyst stage of development, embryonic stem cells 
with the same set of genes as the patient with flie genetic disease. Studying these 
cells, and their progeny in vitro, permits analysis of a specific model of the disease. 
For example, placing stem cells derived &om a patient with a genetic disorder under 
the control of various stem cell regulatory factors can elicit abnormal responses ficom 
the affected stem cells compared to stem cells derived firam a healthy individuals 
somatic nucleus. 

[0235] Embryonic stem cell-related sequences can possess or interact with 
the stem cell factor (SCF) domain, a transmembrane domain having a soluble, 
secreted form, which is involved in hematopoeisis, and which binds to and activates a 
receptor tyrosine kinase, stimulating the proliferation of mast ceUs and augmenting 
the proliferation of myeloid and lymphoid hematopoietic progenitors in bone marrow 
culture (http://pfaxn.wustl.edu/cgi-bin/getdesc?name=SCF), 

[0236] Cirtam stem cell related sequences can possess the ability to maintain 
the stem cell in undififerentiated state while allowing cell proUferation. Such 
con^sitions can be useftd in ex vivo cell therapy to e^and populations of cells for 
cell replacemeiit tiierapy. 

[0237] Certain stem cell related sequeinces can possess the ability to cause 
cell differentiation to a relatively mature cell type and are usefiil to in vivo or ex vivo 
therapy to compensate for deficiency of such relatively mature cell type. 

Synthetase-Related Sequences 

[0238] A synthetase is an enzyme that catalyzes tiie synthesis of a 

molecule. Synthetases comprise abroad class of enzymes; they catalyze the synthesis 
of nucleic adds, peptides, and lipids (Agou et al., 1996). Examples of synthetases 
include lysyl-tRNA synthetase, asparaginyl t-KNA synflietase, holocarboxylase 
synthetase, carbamyl phosphate synthetase I, and argininosuocinate synttietase. 

[0239] Synfhetase-related sequences can possess or interact with transfer 
RNA synthetase domains, which are protein domains that activate amino acids and 
transfer them to specific transfer RNA molecules as a step in protein biosyn&esis 
(http://pfam.wusti.edu/cgi-bin/getdesc?riame==4RNA-synt_2). The 20 aminoacyl- 
tRNA synthetases are divided into class I and class II, each of which contain multq)le 
synthetases with different specificities. For example, there is a protein domain 
involved in tiie asparagines, aspartic acid, and lysine syntiiesis (htlp://p6m.wusfl. 
edu/cgi-bin/textseaich?teriiis=tma-synttoearchjvhat=all&sectioiis= 
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DE&sections=CC&size=100). Synthetase-related sequences can also possess or 
interact with Upid-A-disaccharide synthetase (I^) 

domains that catalyze the synthesis of disaccharides (http://pfam. wustl.edu/cgi- 
bin/getdesc? name=I^xB). 

TATA Box-Related iSequences 

[0240] A TATA box is a consensus sequence in the promoter region of 

many eucaryotic genes that binds a general transcription fector and plays a lole in 
specifying the position for transcription initiation. TATA boxes are generally found 
proximately 25 nucleotides before the site of transcription initiation (Chalut et al., 
1995). Examples of TATA box-related sequences include TATA box binding 
protein, 13 TATA/TBP, arid small nuclear KNA-activating protein 190 Myb DNA. 

[0241] TATA box-related sequences can possess or interact with 
transcription fector TFIID, also known as the TATA-bmding protein (TBP) domain, 
which is a protein domain that specifically binds to the TATA box promoter element 

(http://pfam.wustl.edu/cgi-bin/getdesc?name=TBP). TATA box-related sequences 
can also possess or interact with HMG14 and HMG17 (HMG14 J7) domains, which 
are members of a femily of high mobility group proteins, described above 
ObtttprZ/pfanLWUstl. edu/cgi-bin/getdesc? name^HMG14_17). 
Tat-Related Sequences 

[0242] Tat is a huinanirnmunodefideaicyyinis (BnV) protein invol^^ 

iri viral production of new KNA genomes and new complete viral particles. Tat is 
also involved in AIDS pathogenesis; it plays a role in reactivating latent viruses, e.g., 
the JC retrovirus; it is mvolved in the development of AIDS-related Kaposils 
Sarcoma; and it depresses the function of; and induces i5)optosis in, helper CD4 cells 
(Yu et aL, 1995). Examples of Tat-related sequences mclude Tat-associated proteins, 
e.g., T^, mV-l Rev, and tat-associated kinase (also known as positive transcriptional . 
elongation fector b). 

[0243] Tat-related sequences can possess or interact with 

traiisactivating regulatory protem (Jat) ddmaiiis, \?^*i 

contribute to efficient transcription of a vhal genome (htlp:/^fem.wustl.edu/cgi- 
bin/getdesc?name=Tat). Tat-related sequences can also possess or interact with 
mitochondrial glycoprotein (MAM33) domains, which are protem domains found in 
rnitochondrial matrix proteins, and which can be involved in mitochondrial oxidative 
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phosphorylation and in interactions hetween the nucleus and the mitochondria 
(http://pfam.wustl.edu/cgi4)in/getdesc?name=MA^ 
Transferase-Related Sequences 

[0244] Transferases are enzymes that transfer a desigpated groiq) of 

atoms from a donor molecule to an acceptor molecule. For example, acyl transferases 
transfer acyl groups, methyl transferases transfer methyl groups, nucleotidyl 
transferases transfer nucleotides, prenyltransferases transfer prenyl groups, and 
glycosyl transferases transfer glycosyl groups (lin et al., 1996), Examples of 
transferases include acetyltransferases, hydroxymethyltransferases, sialyltransferases, 
arginine N-methyltransferase, glucoronosyltransferase, NTP-transferase, and GDP- 
pyrpphosphorylase B. 

[0245] Transferase-related sequences possess or interact widi UDP- 
glucuronosyl and UDP-glucosyl transferase domains, which are protein domains 
found in a si^erfemily of enzyines that catalyze the addition of the glycosyl group 
from a UTP-sugar to a small hydrophobic molecule (ht^://pfam,wu5tl.edu/cgi- 
bin/getdesc?name=UDPQT). Transferase-related sequences also possess or interact 
with nucleotide transferase (NTP_transferase) domains, which are protein domains 
that transfer nucleotides onto phosphorylated sugars (http://pfani-wustl.edu/cgi- 
bin/getdesc?Mm(B=NTPjxa3isferase). 

Transposase-Related Sequences 

[0246] Transposases are site-specific recombination enzymes that 

catalyze the transposition of a segment of DNA fix)m one part of the genome to 
another. The movable segmentis are called transposable elements; each tninsposable 
element is occasionally moved by a transposase, which functions as an integrase, by 
inserting DNA sequences into other DNA sequences. Tranq)osases are often encoded 
by the DNA offhe transposable elemrat itself. Transposases bind specifically to 
tenninal inverted repeats of 10-500 bp tiiat are chamcteristically part of transposable 
elements (Smit and Riggs, 1996). They catalyze both cutting and pasting of a 
transposable element fix)m one segment of the genome to another. Sequences related 
to transpdsases can have other functions, e.g., as transcription factors, or in the 
assembly of centromere proteins (Smit and Riggs, 1996). Examples of transposase- 
lelated sequences include mariner, pogo, hobo, tiggdr, MER37, Galileo, Occan^ 
Impala, Tn MERIl, MsqTc3, and the sleiepmg beauty transposon system (Robertson 
and Zumpano, 1997; Robertson, 1996; Smit and Riggs, 1996). 
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[0247] Transposase-related sequences can possess or interact wifli 
transposase 1 (Transposase__l) domain, which is characterized by sequences that can 
excise and/or insert mobile genetic elements such as transposons or insertion 
sequences; for example, mariner possesses a transposase 1 domain 
(http://pfem.wustl.edu/cgi-bin/getdesc? name=Transposase_l). Transposase-related 
sequences can also possess or interact with LI transposable element (Transposase_22) 
domains, which have been described above. Transposase-related sequences can also 
possess or interact with a DDE endonuclease (DDE) domain, which is responsible for 
coordinating metal ions needed for endonuclease catalytic activity (http://pfem.wustl. 
edu/cgi-bin/getdesc? name=DDE). transposase-related sequences can additionally 
possess or int^act with a zinc finger, C2H2 type (zf-C2H2) domain, which bind 
nucleic adds using a mechanism fliat involves coordinating a zinc atom with a pair of 
cysteine residues and a pair of histidine residues (http://pfem,wustl.edu/cgi- 
bin/getdesc?name=zf-C2H2). Tiansposase-ielated sequences can also possess or 
interact with a reverse transcriptase (rvt) domain, and/or a low-density lipoprotein 
receptor (Idljcece) doniain, both of which are described above. 

Ubiquitin-Related Sequences 

[0248] Ubiquitin is a protein found in all eucaryotic cells examined to 
date. When it is linked to the lysine side chain of a protein by the formation of an 
amide bond with its C-terminal glycine, ubiquitin renders the ubiquitin-bound protein 
subject to rapid proteolysis in the proteasome. In addition to its role in flie selective 
degradation of cellular proteins, ubiquitin also plays a role in maintaining 
chromosome structure, regulating gene e3q)ression, responding to stresses on the 
organism, the regulation of gene expression, and ribosome biogenesis. Examples of 
ubiquitin-related sequences include elongins, ubiquitin-specific proteases, ubiquitin- 
cahnodulin ligase, ubiquitin carrier protein kinase, ubiquitin N-a^5ha-protein 
hydrolase, and the small ubiquitin-related modifier (Sumo-1) (Kamitani et aL, 1997). 

[0249] Ubiquitin-related sequences can possess or interact with a 
ubiquitin domain, which is a conserved sequrace of approximately 76 amino acid 
residues that comprise the protein ubiquitin (ht^://pfam.wusfl.edu/cgi- 
bin/getdesc?name=ubiquitin). Ubiquitin-related sequences can also possess or 
interact a ubiquitin carboxyl-terminal hydrolase (UCH) domain, which is a protein 
domain that comprises a thiol protease that recognizes and hydrolyses the peptide 
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bond at die C-tenninal glycine of ubiquitin (http://^fam.wustl.edu/cgi-bin/get 
desc?name=UCH), 

Vinis-Related Sequences 

[0250] The hitman chromosome has integrated endogenous genes that 
are related to viral genes. Some endogenous viral genes, e.g., the retroviral HERV-W 
family, are widely and heterogeneously dispersed among human chromosomes 
(Voisset et al., 2000; Everett et al., 1997; Werner et aL, 1990). Endogenous 
piDvinises are usually transcriptionally silrat, but are expressed under certain 
conditions (Coffin et al., 1997), Endogenous viral expression can be specific to host 
factors, such as cell type or stage of differentiation, as well as other fectors including 
the position on the chromosome, the influence of ci^-acting sequences, or the presence 
of host-mediated DNA methylation (CofGn). 

[0251] Endogenous viral expression can have a number of . 
consequences, both beneficial and detrimentaL Among the beneficial consequences is 
the ability of endogenous retroviruses to confer resistance to infection by exogenous 
viruses. For example, mice with endogenous mouse mammary tumor virus (MMTV) 
can be immune to exogenous infection (Golovldna, et al., 1992). Among the 
detrimental effects is a causative role in disease. Evid^ice indicates an association 
between endogenous viruses with cancers and autoimmune diseases (Coffin et al., 
1997). For example, spontaneous tumors of specific origin, murine mammary 
adenocarcinomas, and murine T-cell lymphomas have been associated wifli the 
presence of specific endogenous retroviruses. Furthermore, a transformed phenotype 
.is associated with the increased transcription of certain classes of endogenous viral 
elotnents (Coffin et al., 1997).. With respect to autoimmune disease, an endogenous 
virus that influences fiie immunoregulatory process has been associated with 
spontaneous autoimmune thyroiditis in a chicken model of human Hashimoto disease 
(Wick et al., 1987). Exanqples of viral-related proteins include hepatitis B virus x- 
interacting protein, herpesvirus associated ubiquitin-specific protease, and 
Coxsackievirus and adenovirus receptor precursor. 

[0252] Viral-related sequences can possess or interact with rvt, rve, and 
gag p30 sequences, all of which are described above. 
Zinc Finger-Related Sequences 

[0253] A 2dnc finger domain is a small, self-folding, structural motif of 25 to 
30 amino-acid residues present in many nucleic acid-binding proteins. It is comprised 
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of a polypeptide loop held in a hairpin bend and bound to a zinc atom, and includes 
two conserved cysteine and two conserved histidine residues. Many classes of zdnc 
fingers have been characterized according to the number and positions of th^ 
conserved histidine and cysteine residues. The amino acid configuration that holds 
the zinc atom in a tetrahedral array has a finger-like projection that interacts with 
nucleotides in the major groove of the bound nucleic acid Zinc finger motifs have 
conserved regions near the zinc molecule, and variable regions at the nucleic acid 
binding site that provide specificity for the nucleic acid sequences they bind Zinc 
finger proteins have a variety of fimctions, including as transcription regulators and 
intraoellidar receptors. Zinc finger domains are also involved in proteinrprotein 
interactions, e.g., those involving protein kinase C Recently, zinc finger nucleases 
have been used to target genes for gene replacement by homologous recombination 
(Bibikova et al,, 2003). Examples of zinc finger proteins include XC3H-3b, the 
transcription fector Slug, and transcription factor niA. 

[0254] Zinc finger-related sequences can possess or interact with a zinc 
finger C2H2 type (zf-C2H2) domain, which binds a zinc atom with two cysteine and 
two histidine residues, and is utilized, e.g., in KNA transcription (http:/^fam,wustt, 
- edu/cgi-bin/getdesc?name===rf-C2H2). Zinc finger-related sequences can also possess 
or interact with a C3HC4 type, RING finger (zf-C3HC4) domain, which is a 
specialized type of zmc finger domain comprised of 40 to 60 amino acids that binds 
two zinc atoms; variants of RING-finger domains include the C3HC4-type and the 
C3H2C3-type (http:/i^fem.wustl,edu/cgi-bin/getdesc?name==z^^ Proteins 
with RING-finger domains have developmental arid fimctional roles; they are 
involved in intracellular receptor binding, and in mediating protein-protein 
interactions (Gray et al., 2000), RING-fing«: domains can exhibit ubiquitin-protein 
ligase activity, and can bind to E2 ubiquitin-conjugating enzymes. 

[0255] Zinc finger-related sequences can also possess or interact with a zinc 
knuckle (zf-CCHC) domain, which is an 1 8-amino acid zmc finger domain found in 
RNA-bindmg and single stond DNA-binding proteins; they are often involved in 
eukaryotic gene regulation (http://pfam.wusti.edu/cgi*in/getdesc?niame==rf-CCHQ 
Zinc knuckles are also found in retroviral gag and nucleocapsid proteins, where they 
function in genome packaging, and early in the infection process. Zinc finger-related 
sequences can also possess or interact with a BTB/POZ (BTB) domain, which 
mediates both homomeric and heteromeric protein dimerization (http://pfam.wusti. 
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edii/cgi-biii/getdesc?name=BTB). Zinc finger-related sequences can also possess or 
interact with NF-Xl type zinc finger (zf-NF-Xl) domains, which are found in the 
transcriptional repressor NK-Xl, where they repress transcription of HLA-DRA, and 
in the shuttle craft protein, which plays a role in late stage embryonic neurogeuesis 
(htlp://pfeni,wustl.edu/cgi-bin/getdesc?name===zf-^ Zmc finger-related 
sequences can also possess or interact with a KRAB box (KRAB) domain, also 
known as a Kruppel-associated box, which is comprised of approximately 75 amino 
acids, enriched in charged amino acids, and involved in protein-protein interactions 
0ittpy/p&in.wustl,edu/cgi-bin/getdesc7 name==KRAB). KRAB domains can fimction 
as transcription factors, e.g,, as a transcriptional repressor, and can assume roles in 
cell differentiation and development (Aubry et aL, 1992; Lovering and Trowsdale, 
1991). Zinc £mger-related sequences can possess or inta:act with a traris^ 
domain, which is described above, 

]>n>USTIUAL Applicabilitv 
[0256] The invention provides sequences related to secreted sequOTces, 
single-transmembrane sequences, multiple-transmembrane sequences, kmase-related 
sequences, ligase-related sequences, nuclear hormone receptor-i^lated sequences, 
phosphatase-related sequences, protease-related sequences, phosphodiesterase-ielated 
sequences, kinesin-related sequences, immunoglobulin-related sequences, T-cell 
receptor-related sequences, glycosylphosphatidylinositoi anchor-related sequences, 
and sequences related to other nucleic acid and amino acid sequences of the invention, 
mcluding activators, ad^tors, adhesion molecules, ATPases, ATP, breakpoints, 
channels, checkpoints, coinplexes, dehydrogenases, disintegrins, endopeptidases, 
germ-cells, GTPases, heUcases, hydrolases, integrases, integrins, isornerases, 
membranes, mucins, oxygenases, peroxidases, phopholipases, pros^osins, 
proteosomiBs, reductases, reverse trancriptases, RNases, RNases H, SH3', synthetases, 
TATA boxes. Tat proteins, transferases, transposases, ubiquitins, and viruses. The 
invention provides for novel polynucleotides, related novel polypeptides and active 
fiagments thereof, as well as novel nucleic acid compositions encoding these 
polypeptides, compositions comprismg the related polypeptides, and methods for their 
use. 

[0257] The present invention also provides for vectors, host cells, and 
metiiods for producing the polynucleotides and polypeptides of the invention in these 
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vectors and host cells. The present invention further provides for antisense molecules* 
that are capable of regulating the expression of the polynucleotides or polypeptides 
herein. In addition, modulators, including antibodies that bind specifically to the 
polypeptides or modulate the activity of the polypeptides, are also provided. 

[0258] The present polynucleotides, polypeptides, and modulators find 
use in therapeutic agent screoiing/discovery applications, such as screening for 
receptors or con5)etitive ligands, for use, for example, as small molecule therapeutic 
drugs. Also provided are methods of modulating a biological activity of a polypeptide 
and methods of treating associated disease conditions, particularly by administering 
modulators of the preset polypeptides, such as small molecule modulators, antisense 
molecules, and specific antibodies. 

[0259] The presmt polypeptides, polynucleotides, and modulators find 
use in a number ofdiagnostic, prophylactic, and therq>eutic^)ptications. The 
polynucleotides and polypq)tides of the invention can be detected by methods 
provided herein; these methods are useful in diagnosis, and can be acconq)lished by 
the use of diagnostic kits. The polynucleotides and polypeptides of the invention are 
usefid for treating a variety of disorders, including cancer, proUferative di 
. inflammatory disorders, immune disorders, viral disorders, and other metabolic 
disorders. For osample, subjects ^o suffer fixmi a deficiency, or a lack of a 
particular protein, or are otherwise in need of such protein to repair or eiJ^ 
desirable function, benefit from the administration of a protein pr an active fi:agment 
thereof by any conventional routes of administration. These include therapeutic 
vaccines in the form of nucleic acid or polypeptide vaccines, such as cancer vaccines, 
where the vaccines can be administered alone, such as naked DNA, or can be 
fecilitated, such as via viral vectors, microsomes, or liposomes. Therapeutics 
antibodies include those that are administered alone or m combination with cytotoxic 
agents, such as radioactive or chemother^eutic agents. 

[0260] In particular, the polypeptides, polynucleotides, and modulators 
of flie present invention can be used to treiat cancers, including, but not limited to, 
cancers of the prostate, breast, bone, soft tissue, liver, kidney, ovary, cervix, skin, 
pancreas, and brain, as well as leukemias, lymphomas, lung cancers such as 
adenocarcinomas and squamous cell carcinoma, and cancers of gastrointestinal organs 
such as stomach, colon, and rectum. Further, the polypeptides, polynucleotides, and 
modulators of the present invention can be iised to treat inflammatory, immune,. 
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bacterial, viral, and metabolic diseases, disorders, syndromes, or conditions, 
including, but not limited to, intestinal inflammation and immunity, autoimmune 
thyroiditis, and retroviral infections, as well as tissue and/or organ hypertrophy. 

DisoLOSURE OF The Invention 
[0261] The present invention features an isolated polynucleotide that 
encodes a polypeptide. In some embodiments, the polypeptide has at least about 70%, 
at least about 75%, at least about 80%, at least about 85%, at least about 90%, at least 
about 95%i at least about 97%, at least about 98%, or at least about 99% amino acid 
sequence identity with an amino acid sequence derived fix>m a polynucleotide 
sequence chosen from at least one nucleotide sequence according to SEQ ID NOS.: 1 
- 209 and 419 - 627. In some embodiments, the polypeptide has an amino acid 
sequence chosen fix>m at least one amino acid sequence according to SEQ ID. NOS. 
210-418. In many embodiments, the polypeptide has at least one abtivilyass^ 

with the naturally occurring encoded polypeptide. 

[0262] la some embodiments, the polypisptide includes a signal peptide. In 

alternative embodiments, the polypeptide comprises a mature form of a protem, from 
which Ihe signal peptide has been cleaved. In other embodiments, the polypeptide is a 
signal peptide. In a ftrlher aspect, the invention provides fragments of a polypeptide 
chosen from at least one amino acid sequence according to SEQ ID NOS.: 210 - 418, 
where each fragment is an extracellular fragment of the polypeptide, or an 
extracellular fragment of the polypeptide minus the signal peptide. The mvention 
provides anN-terminal fragment containing a Pfem domain and a C-tOTninal 
fragment containing a Pfam domain and either or both may be biologically active, 

[0263] In yet other embodiments, the polypeptides ftmction as secreted 
proteins. In yet ftirflier embodiments, the polypeptides function as single- 
transmembrane proteins. In yet ftirfheremlxkiiments, the polypq)tides functional 
multiple-transmembrane proteins. In yet further embodinients, the polypeptides 
function as kinases. In yet further embodiments, the polypeptides function as protein 
kinases. In yet further embodiments, the polypeptides function as ligases. In yet 
ftirther embodiments, the polypeptides function as nuclear hormone receptors. In yet 
further embodiments, the polypeptides function as phosphatases. In yet further 
embodiments, the polypeptides function as proteases. In yet further embodiments, tiie 
polypeptides function as phosphodiesterases. In yet further embodiments, the 
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polypeptides function as kinesins. In yet further embodiments, the polypeptides 
function as immunoglobulins. In yet further embodiments, the polypeptides function 
as T-cell receptors. In yet fiulher embodimmts, the polypeptides ftmction as 
glycosylphosphatidylinositol anchors. 

[0264] In yet further embodimentis, the polypeptides function as cytokines. . 
In still further embodiments, the polyp^tides function as immune cells. In further 
embodiments, the polypeptides function as antigens, hi yet fiuiher onbodiments, the 
polypeptides function as receptors, hi other embodiments, tiie polypeptides function 
as binding proteins. In other embodiments, the polypeptides function as fectors. In 
fioiher embodiments, the polypeptides ftmction as growth factor Infiofher 
embodiments, the polypq)tides function as heat-shock proteins. In some 
embodiments, the polypeptides function as membrane traiisport protems. Li yet 
further embodiments, the polypeptides fimction as tibosoina^ In some 

embodiments, the polypeptides function as zinc fingers. In some embodiments, the 
polypeptides Amotion as embiyonic stem cell-related peptides. In still further 
embodiments, the polypeptides function in pathological states. In other embodiments, 
the polypeptides function as one or more of these. 

[0265] In.yet further ^bodiments, the polypeptides function as activators. 
In yet fiulfaer embodiments, the polyp^tides function as adaptors. In yet further 
embodiments, the polypeptides function as adhesicm' molecules. In yet further 
embodiments, the polypeptides function as ATPases. In yet further embodiments, the 
polypq)tides function as ATP-related polypeptides. In further embodiments, the 
polypeptides function as channel-related polypqptides. In yet further embodiments, 
the polypeptides function as checkpoint-related polypeptides. In yet further 
embodiments, the polypeptides function as conq)lexes. In yet further embodiments, 
die polypeptides function as xiehydiogenases. Li yet fiarther embodiments, flie 
polypeptides function as dismtegrins. In yet further embodiments, the polypeptides 
function as endopeptidases. In yet further embodiments, the polypq)tides function as 
germ-cells. In yet further embodunents, the polypeptides function as GTPases. In yet 
further embodiments, the polypq)tides function as helicases. Inyetftother 
embodiments, the polypeptides function as hydrolases. In yet further embodiments, 
the polypeptides function as integrases. In yet further embodiments, the polypq)tides 
function as integrins. In yet further embodiments, the polyp^tides function as 
isomerases. In yet fiuther embodiments, the polypq)tidesfimction as membranes. In 
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yet furthdr embcKiiments, the polypeptides fun In yet further 

embodiments, the polypeptides function as oxygenases. In yet further embodiments, 
the polypq)tides function as peroxidases. In some embodiments, the polypeptides 
function as phospholipases. In yet further embodiments, the polypeptides function as 
prosaposms. In yet further embodiments, the polypeptides flmction as proteasomes, . 
In yet further embodiments, tiie polypeptides fimction as reductases. In other 
embodiments, the polypeptides function as reverse transcriptase^lated polypq>tides. 
In yet further embodimente, the polypeptide In further 

embodimdits, the polypeptides fimction as RNaseH-related polypeptides. In yet 
further embodiments, the polypeptides function as SH3-ielated polypeptides. In yet . 
further embodimCTts,ithepolypq>tidjes function as s^ Inyetfiirther 
embodiments, the polypeptides function as TATA box-related pplypq^ In yet 
fijrlherembodnnents, the polypq)tides fimction as TAT-relatedpoly^ In yet 
further embodiments, the polypeptides function as transferases. In yet further 
embodiments, the polypeptides function as transposases. Inyetfiirther embodiments, 
the polypeptides function as ubiquitin-related polypeptides. In yet further 
embodiments, the polypeptides fimction as virus-related polype^ In other 
embodiments, the polypeptides function as one or more of these. 

[0266] The present invention features an isolated polynucleotide that 
hybridizes under stringent hybridization conditions to a coding region of at least one 
nucleotide sequence shown in SEQ ID NOS.: 1 - 209, 419 - 627, or a con^)lement 
thereof. 

[0267] The present invention features an isolated polynucleotide that shares 
at least about 70%, at least about 75%, at least about 80%, at least about 85%, at least 
about 90%, at least about 95%, at least about 97%, at least ibout 98%, at least about 
99% nucleotide sequence identity with a nucleotide sequ^ce of the coding region of 
at least one sequence shown in SEQ ID NOS.: 1 - 209, 419 - 627, or a con5)lement 
thereof. In some embodiments, a subject polynucleotide has the nucleotide sequence 
shown in at least one of SEQ ID NOS.: 1 - 209, 419 - 627, or a codmg region thereof. 

[0268] The present invention also features a vector, e.g., a recombinant 
vector, that includes a subject polynucleotide, and a promoter the drives its 
egression. This vector can transform a host cell, and the present invention further 
features such host cells, e.g., isolated in vitro host cells, and in vivo host cells, that 
comprise a polynucleotide of the invention, or a recombinant vector of the invention. 
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[0269] The present invention further features a library of polynucleotides, 
wherein at least one of the polynucleotides conqjrises the sequence information of a 
polynucleotide of the invention. In specific embodiments, the library is provided on a 
nucleic acid array. In some embodiments, the library is provided in computer- 
readable format 

[0270] The present invention features a pair of isolated nucleic acid 
molecules, each &om about 10 to about 200 nucleotides in length. The first nucleic 
acid molecule of the pair con^rises a sequence of at least 10 contiguous nucleotides 
having 100% sequence identity to at least one nucleic acid sequence shown in SEQ ID 
NOS.: 1 - 209 and 419 - 627, The second nucleic acid molecule of the pair comprises 
a sequence of at least 10 contiguous nucleotides having 100% sequence identity to the 
reverse con5)lement of at least one nucleic acid sequence shown in SEQ ID NOS.: 1 - 
209 and 419 - 627. The sequence of said second nucleic acid molecule is located 3'of 
the nucleic acid sequence of the first nucleic acid molecule shown in SEQ ID NOS.: 1 
- 209 and 419 - 627. The pair of isolated nucleic acid molecules are useful in a 
polyimerase chain reaction or in any other method known in the art to amplify a 
nucleic acid that has sequence identity to the sequences shown in SEQ ID NOS.: 1 - 
209 and 419 - 627, particularly when cDNA is used as a template. 

[0271] The invention features a method of determining the presence of a 
polynucleotide isubstantially identical to a polynucleotide sequence shown in the 
Sequence Listing, or a complement of such a nucleotide by providing its complement, 
allowing the polynucleotides to interact, and determining whether such interaction has 
occurred. 

[0272] The invention further features methods of regulating the expression 
of the subject polynucleotides and encoded polypeptides. The invention provides a 
method of inhibiting transcription or translation of a first polynucleotide encoding a 
fiist polypeptide of the invention by providing a second polynucleotide that 
hybridizes to the first polynucleotide, and allowing the first polynucleotide to contact 
and bind to the second polynucleotide. The second polynucleotide can be chosen 
from an antisense molecule, a ribozyme, and an interfering RNA (RNAi) molecule. 

[0273] The present invention fiirther features an isolated polypq)tide, e.g., an 
isolated polypeptide encoded by a polynucleotide, and biologically active fi:agments 
of such polypeptide. In some embodiments, the polypeptide is a fusion proteia In 
some embodiments, the polypeptide has one or more amino acid substitutions, and/or 
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insertions and/or deletions, compared with at least one sequence shown in SEQ ID 
NOS.: 210 - 418. In some embodiments, the polypeptide has an amino acid sequence 
derived firom at least one nucleotide sequence shown in SEQ ID NOS.: 1 - 209 and 
419 « 627, In some embodiments, the polypeptide has an amino acid sequence 
substantially identical to at least one sequence shown in SEQ ID NOS.: 210 - 418. 

[0274] The invention also provides a method of making a polypq>tide of the 
invention by providing a nucleic acid molecule that conqnises a polynucleotide 
sequence encoding a polypeptide of the invention, introducing the nucleic acid 
molecule into an CTqpression system, and allowing the polypeptide to be produced 

[0275] In some embodiments, the method involves in vitro cell-free 
transcription and/or translation. For example, the expression system can comprise a 
cell-free eitpressioa system, such as an£l coli system, a wheat genn extract system, a 
rabbit reticulo<^rte system, or a fiog oocyte system. 

[0276] In certain other embodiments, the expression system can comprise a 
prokaryotic or eukaryotic cell, for exanq>le, a bacterial cell e}q>ression system, a 
fungal cell expression system, such as yeast or Aspergillus^ a plant cell egression 
system, e.g., a pereal plant, a tobacco plant, a tomato plant, or other edible plant, an 
insect cell expression system, such as SF9 of Hig^ Five cells, an 2mpWd]m cell 
e3q)ression system, a reptile cell e^^ression system, a crustacean cell e^qpression 
system, an avian cell expression system, a fish cell expr^ion system, or a 
manmialian cell expression system, such as one usirig Chinese flanoster Ovary (CHO) 
cells. In some embodiments, the method involves culturing a subject host ceU under 
conditions such that the subject polypeptide is produced by the host cells; and 
recovering the subject polypeptide fix)m the culture, e.g., fix>m within the host cells, or 
from the culture medium. In further embodiments, the polypeptide can be produced 
in ^dvo in a multicellular animal or plant, comprising a polynucleotide encoding the 
subject polypeptide. 

[0277] The preset invention further features a non-human animal injected 
with at least one polynucleotide comprising at least one nucleotide sequence chosen 
fix>m SEQ ID NOS.: 1 r 209 and 419 - 627, and/or at least one polypeptide comprising 
at least one amino acid sequence chosen form SEQ ID NOS.: 210 - 418. 

[0278] The invention further provides a kit comprising one or more of a 
polynucleotide or polypeptide, which may include instructions for its use. Such kits 
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aie useful in diagnostic appUcations, for example, to detect the presence and/or level 
of a polypeptide in a biological sanq)le. 

Modes FOR Carrying Out THE Invention 

Brief Description of the Tables 

[0279] Each sequence shown in Tables 1-3 is identified by a Five Prime 
Therapeutics, Inc. (FP) identification number (FP ID). Table 1 specifies the predicted 
number of amino acid residues in each FP protem of the invention (Length, Predicted 
Protein), table 1 also specifies the percent of the FP sequence that is covered by the 
piibUc National Center for Information Biotechnology (VfCBT) database (Prediction 
Covered by PubUc). Table 1 also describes flie characteristics of Ihe protem in flie 
NCBI datiibase displaymg flie greatest degree of similarity to each claimed sequence. 
This protem is described by its NCBI accession number (Top Hit Accession No.), and 
by tiie NCBI% annotation of fliat sequence (Tqp Hit Annotation). 

[0280] Table 2 describes flie characteristics of the human proton in the 
NCBI database with ttie greatest degree ofsimilarity to each claimed sequaice. The 

predicted number of amino adds of fliis human protein is q>ecified OLength, Human 
Top Hit). Table 2 also specifies any existing protem family (Pfem) classification for 
fliese human sequences. Table 2 specifies flie result offliealgoriflmi described above 
fliat predicts whether the claimed FP sequence is seaeted (Tree Vote, Secrete<0. 
Table 2 sets fortti the position of flie amino add residues con^rising the signd 
peptide sequences (SP Positions) of flie claimed FP sequences. Table 2 also specifies 
flie position(s), if any, of flie amino acid residues compriising flie transmembrane 
domains m eadi claimed FP sequence (IM domains), and flie number of 
transmembrane domains of eadi claimed FP sequence (TM Total). 

[0281] Table 3 describes flie characteristics oftheFantran mouse protein 

with die greatest degree of similarity to flie claimed sequences. The Fantom database 
was compiled by tiie Fantom Consortium and is accessible, for racample, at 
htlp://fantom.gsc.riken.go.jp/db/ (Bono et al., 2002). It provides cuiated fimctional 
annotation to full-lengfli mouse sequences (Okzaki et al., 2002). The siimlarities of 
flie claimed sequences of the invention witii the annotated sequences in Tables 1-3 
suggest that tiiey may share structural and functional properties, and exhibit sunilar 
esquession profiles and localizations. 
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Definitions 

[0282] "Related sequences" include nucleotide and amino acid sequences 
that are involved in the function of their referent For example, "recq)tor-related 
sequences" include all sequences that are involved in receptor fimctioiL This 
includes, but is not limited to, sequences that are involved in receptor synthesis, 
receptor regulation, receptor effector function, and receptor degradation* "Related 
sequences" also encon^ass complementary nucleic acid sequences, and biologically 
active fragments of nucleic acid and amino acid sequences. 

[0283] The terms "polynucleotide," "nucleotide," "nucleic acid," 
"polynucleic molecule," "nucleotide molecule," "nucleic acid molecule," "nucleic acid 
sequence," "polynucleotide sequence," and "nucleotide sequence" are used 
interchangeably herein to refer to polymeric fonns of nucleotides of any length- The 
polynucleotides can contain deoKyribonucleotides, ribonucleotides, and/or their 
analogs or derivatives. Forexaixq)le,nucldcaddscanbe naturaUyoccurrmgDNAor 
RNA, or can be synthetic andogs, as known in the art The terms also encompass 
genomic DNA, genes, gene fragments, exons, introns, regulatory sequraces or 
regulatory elements (such as promoters, enhancers, initiation and termination regions, 
-other control regions, expression regulatoty factors, and expression controls), DNA 
con^rising one or more single-nucleotide polymorphisms (SNPs), allelic variants, 
iisolated DNA of any sequence, and cDNA. The terms also encompass niRNA^ tRNA, 
rBNA, ribozymes, splice variants, antisense RNA, antisense conjugates, RNAi, and 
isolated RNA of any sequence. The terms also encompass recombinant 
polynucleotides, heterologous polynucleotides, branched polynucleotides^ labeled 
polynucleotides, hybrid DNA/RNA, polynucleotide constructs, vectors comprising the 
subject nucleic acids, nucleic acid probes, primers, and primer pairs. The 
polynucleotides can comprise modified nucleic acid molecules, with alt^tions in the 
backbone, sugars, or heterocyclic bases, such as methylated nucleic add molecules, 
pieptide nucleic acids, and nucleic acid molecule analogs, which may be suitable as, 
for example, probes if they demonstrate superior stability and/or binding affinity 
under assay conditions. Analogs of purines and pyrimidines, including radiolabeled 
and jQuorescent analogs, are known in the art The polynucleotides can have any 
three-dimensional structure, and can perform any function, known or as yet unknown. 
The terms also encompass sin^e-stranded, double-stranded and triple helical 
molecules that are either DNA, RNA, or hybrid DNA/RNA and that may encode a 
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full-length gene or a biologically active fragment thereof. Biologically active 
firagments of polynucleotides can encode the polypeptides herein, as well as anti-sense 
and RNAi molecules. Thus, the fiiU length polynucleotides herein may be treated 
with enzymes, such as Dicer, to generate a library of short RNAi fragments which are 
within the scope of the present invention. 

[0284] The novel polynucleotides hCTcin include those shown in the Tables, 
SEQ ID NOS.: 1 - 209 and 419 - 627, as well as those that encode the polypeptides of 
SEQ ED NOS.: 210 - 41 8, and biologically active fragments thereof. The 
polynucleotides also include modified, labeled, and degenerate variants of the nucleic 
acid sequences, as well as nucleic acid sequences that are substantially similar or 
homologous to nucleic adds encoding the subject proteins. 

[0285] A "biologically active" entity, or an entity having "biological 
activity," is one having structural, regulatoiy, or biochemical frmctions of a naturally 
occiining molecule or any frmction related to or associated with a metabolic or 
physiological process. Biologically active polynucleotide firagments are those 
exhibiting activity similar, but not necessarily identical, to an activity of a 
polynucleotide of the present invention. The biological activity can include an 
improved desired activity, or a decreased undesirable activity. For exainple, an entity 
demonstrates biological activity when it participates in a molecular interaction with 
another molecule, or when it has therapeutic value in alleviating a disease condition, 
or when it has prophylactic value in inducing an immune response to the molecule, or 
when it has diagnostic value in determining the presence of the molecule, such as a 
biologically active fiagment of a polynucleotide that can be detected as unique for the 
polynucleotide molecule, or that can.be used as a primer in PCR. 

[0286] The term "degenerate variant" of a nucleic acid sequence refers to all 
nucleic acid sequences that can be directly translated, according to the standard 
genetic code, to provide an amino acid sequence identical to that translated fixnn a 
reference nucleic acid sequence. 

[0287] the term "gene" or "genomic sequence" as used herein is an open 
reading fi^e encoding specific proteins and polypeptides, for example, an mKNA, 
cDNA, or genomic DNA, and also rnay or may not include intervening introns, or 
adj acent 5 • and 3 ' non-ooding nucleotide sequences inyolved in the regulation of 
expression up to about 20 kb beyond the coding region, and possibly fiuther in either 
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direction. A can be introduced into an appropriate vector for extrachromosomal 
xnainteuance or for integration into a host genoine. 

^0288] The term "transgene" as used herein is a nucleic acid sequence that is 
incorporated into a transgenic organism. A "transgene" can contain one or more 
transcriptional regulatory sequences, and other sequences, such as introns, that may be 
useful for expressing or secreting the nucleic acid or fusion protein it encodes, 

[0289] The term "cDNA" as used herem is intended to include all nucleic 
acids that share the sequence elements of mature mRNA species, where sequmce 
elements are exons and 3 ' and 5 ' non-coding regions. Generally, mRNA species have 
contiguous exons, the intervening introns having been removed by nuclear RNA 
splicing to create a continuous open reading frame encoding a protein. 

[0290] The term "splice variant" refers to all types of RNAs transcribed from 
a given gene that when processed collectively encode plural protein isoforms. The 
term "alternative spUcing" and related terms refer to aU types of RNA processing that 
lead to expression of plural protein isoforms fix>m a smgle gene. Some genes are first 
transcribed as long mRNA precursors that are then shortened by a series of processing 
steps to produce the mature mRNA molecule. One of these steps is RNA splicing, in 
which the intcon sequences are removed from the mRNA precursor. A ceU can splice 
the primary transcript ia different ways, making different "splice variants," and 
thereby making different polypeptide chains from the same gene, or from the same 
mRNA molecule. Sphce variants can include, for example, exon insertions, exon 
extensions, exon truncations, exon deletions, alternatives in the 5' untranslated region 
and alternatives in the 3 * untranslated region. 

[0291] "Oligonucleotide" may generally refer to polynucleotides of between 
about 5 and about 100 nucleotides of single-or double-stranded nucleic acids. For the 
purposes of this disclosure, there is no iq)per limit to the length of an oligonucleotide. 
Oligonucleotides are also known as oligomers or oligos and can be isolated from 
genes, or chemically synthesized by methods known in the art 

[0292] "Nucleic add composition" as used herein is a composition 
comprising a nucleic acid sequence, including one having an open reading frrame that 
encodes a polypeptide and is capable, under ^propriate conditions, of being 
e^ressed as a polypeptide. The term includes, for example, vectors, including 
plasmids, cosmids, viral vectors (e.g., retrovirus vectors such as lentivirus, 
adenovirus, and the Uke), human, yeast, bacterial, Pl-derived artificial chromosomes 
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(HAC's, YAC's, BAG'S, PAC's, etc), and mmi-chromosomes, in vitro host cells, in 
vivo host cells, tissues, organs, allogenic or congenic grafts or transplants, 
multicellular organisms, and chimeric, genetically modified, or transgenic animals 
comprising a subject nucleic acid sequence. 

[0293] An "isolated," "purified," or "substantially isolated" polynucleotide, 
or a polynucleotide in "substantially pure form," in "substantially purified form," in 
"substantial purity," or as an "isolate," is one that is substantially free of the sequences 
with which it is associated in nature, or other nucleic acid sequences that do not 
include a sequence or fragment of the subject polynucleotides. By substantially free 
is meant that less than about 90%, less than about 80%, less than about 70%, less than 
about 60%, or less than about 50% of the composition is made up of materials other 
than the isolated polynucleotide. For exanq)le, the isolated polynucleotide is at least 
about 50%, at least about 60%, at least about 70%, at least about 80%, at least about 
90%, at least alwut 95%, at least about 97%, or at least about 99% fi:ee of 
materials with which it is associated in nature. For example, an isolated 
polynucleotide may be present ia a composition wherein at least about 50%, at least 
about 60%, at least about 70%, at least about 80%, at least about 90%, at least about 
95%, at least about 97%, at least about 99% of the total macromolecules (for example, 
polypeptides, fragments thereof, polynucleotides, fi:agments tfa^^eo^ lipids, 
polysaccharides, and oligosaccharides) in the composition is the isolated 
polynucleotide. Where at least about 99% of the total macromolecules is the isolated 
polynucleotide, the polynucleotide is at least about 99% pure, and the composition 
comprises less than about 1% contaminant As used hereio, an "isolated," "purified" 
or "substantially isolated" polynucleotide, or a polynucleotide in "substantially pure 
form," in "substantially purified form," in "substantial purity," or as an "isolate," also 
refers to recombinant polynucleotides, modified, degenerate and homologous 
polynucleotides, and chemically synthesized polynucleotides, which, by virtue of 
origin or manipulation, are not associated with all or a portion of a polynucleotide 
with which it is associated in nature, are linked to a polynucleotide other than that to 
which it is linked in nature, or do not occur in nature. For exaiiq)le, the subject 
polynucleotides are generally provided as other than on an intact chromosome, and 
recombinant embodiments are typically flanked by one ot more nucleotides not 
normally associated with the subject polynucleotide on a naturally-occurring 
chromosome. 



85 



wo 2004/020595 



PCTAJS2003/027107 



[0294] The tenns "polypeptide," "peptide," and "protein," used 
interchangeably herein, refer to a polymeric form of amino acids of any length, which 
can include naturally-occurring amino adds, coded and non-coded amino acids, 
chemically or biochemically modified, derivatized, or designer amino acids, amino 
acid analogs, peptidomimetics, and depsipeptides, and polypeptides having modified, 
cyclic, bicyclic, depsicyclic, or depsibicycUc peptide backbones. The term includes 
single chain protein as well as multimers. The term also includes conjugated proteins, 
fiision proteins, including, but not limited to, GST fiision proteins, fiision proteins 
with a heterologous ammo acid sequence, fiision proteins with heterologous and 
homologous leader sequences, fiision proteins with or without N-terminal methionine 
residues, pegolyated proteins, and immunologically tagged proteins. Also included in 
this term are variations of naturally occurring proteins, ^ere such variations are 
homologous or substantially similar to the naturally occurring protein, as well as 
correq)onding homologs firom different species. Variants of polypeptide sequences 
include insertions, additions, deletions, or substitutions compared vadi die subject 
polypeptides! The term also includes pq)tide aptamers. 

[0295] The novel polypeptides herein include amino acid sequences encoded 
by an open reading fi:ame (ORF) as shown m SEQ ID NOS.: 210-418, described in 
greater detail below, including the fiiU lengfii protein and fi:agments thereof, 
particularly biologically active firagments and/or firagments corresponding to 
fimctional domains, e.g., a signal pq}tide or leader sequence, an enzyme active site, 
including a cleavage site and an enzyme catalytic site, a domain for interaction with 
other protein(s), a domain for binding DNA, a regulatory domain, a consensus domain 
that is shared with other members of the same protein family, such as a kinase family 
or an immunoglobulin family; an e?ctracellular domain that may act as a target for 
antibody production or that may be cleaved to become a soluble receptor or a ligand 
for a receptor; an intracellular firagment of a transmembrane protein that participates 
in signal transduction; a transmembrane domain of a transmembrane protein that may 
fecilitate water or ion transport; a sequence associated with cell survival and/or cell 
proUferation; a sequence associated with cell cycle arrest, DNA rbpair and/or 
apoptosis; a sequence associated with a disease or disease prognosis, including types 
of cancer, degenerative disease, inflammatory disease, immunological disease, genetic 
disease, metabolic disease, and/or bacterial or viral infection; and including fiisions of 
the subject polypeptides to other proteins or parts thereof; modifications of the subject 
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polypeptide, e.g., comprising modified, derivatized, or designer aimno acids, modified 
peptide backbones, and/or immunological tags; as well as intra- and inter-species 
homologs of the subject polypeptides, 

[0296] The term /Tjicyclic" refers to a peptide with two ring closures formed 
by covalent linkages between amino acids. A covalent linkage between two 
nonadjacent amino acids constitutes a ring closure, as does a second covalent linkage 
between a pair of adjacent amino acids which are already linked by a covalent peptide 
linkage. The covalent linkages fonniog the ring closures can be amide 1^^ 
i.e., the linkage formed between a fi?ee amino on one amino acid and a firee caiboxyl 
of a second amino acid, or linkages formed between tiie side chains or **R" groiq>s of 
amino acids in the peptides. Thus, bicyclic peptides can be *^true" bicyclic peptides, 
i.e., peptides cyclized by the formation of a peptide bond between the N-terminus and 
the C-terminus of the peptide, or they can be "depsi-bicyclic" peptides, i.e;, peptides 
in which the terminal amino acids are covalently linked through their side chain 
moieties. 

[0297] As noted above, a '%iologicaUy active" entity, oir an entity having 
"biological activity," is one having stractural, regulatoiy, or biochemical functions of 
. a naturally occurring molecule or any Amotion related to or associated with a 
metabolic or physiological process. Biologically active polypeptide fragments are 
those exhibiting activity similar, but not necessarily identical, to an activity of a 
polypeptide of the present invention. The biological activity can include an improved 
desired activity, or a decreased undesirable activity. For exanQ)le, an entity 
demonstrates biological activity when it participates in a molecular interaction with 
another molecule, or when it has therq>eutic value in alleviating a disease condition, 
or when it has prophylactic value in inducing an inunune response to the molecule, or 
when it has diagnostic value in determining the presence of the inolecule. A 
biologically active polypeptide or firagment thereof includes one that can participate in 
a biological reaction, for example, as a transcription factor that combines with other 
transcription factors for initiation of transcription, or that can serve as an epitope or 
inmmnogen to stimxilate an immune response, such as production of antibodies, or 
that can transport molecules into or out of cells, or that can perform a catalytic 
activity, for example polymerization or nuclease activity, or that can participate in 
signal transduction by binding to receptors, proteins, or nucleic acids, activating 
enzymes or substrates. 
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[0298] A "signal peptide/' or a 'leader sequence," coii5)rises a 
amino acid residues, typically, at the N terminus of a polypeptide, which directs the 
intraceUul^ trafficking of the polypeptide. Polypeptides that contain a signal peptide 
or leader sequence typically also contain a signal peptide or leada: sequence cleavage 
site. Such polypeptides, after cleavage at the cleavage sites, generate mature 
polypeptides, for example, after extracellular secretion or after being directed to the 
qjpropriale intracellular compartment 

[0299] "Depsipeptides" are compounds containing a sequence of at least two 
alpha-amino acids and at least one alpha-hydroxy carboxylic acid, which are bound 
through at least one normal peptide liiik and ester links, derived from the hydroxy 
caibojtylic acids. "Linear depsipeptides" can con[q>risermgs formed through S-S 
bridges, or through an hydroxy or a mercapto group of an hydroxy-, or mercapto- 
amino add and the caiboa^ group of another amino- or hydroxy-acid but do not 
con^rise rings fonned only throng pq)tide or ester links derived fiwm hydroxy, 
caiboxjdic adds; "Cyclic depsipeptides" are pq)tides containing at least one ring 
formed only dirough peptide or ester links, derived fiom hydro^qr carboxylic acids. 

[0300] An "isolated," "purified," or "substantially isolated" polypeptide, or a 
polypeptide in "substantially pure form," iii "substantially purified form," in 
"substantial purity," or as an "isolate," is one fliat is substantially fi^ of the mataials 
with wMch it is associated in riature or other polypeptide sequences that do not 
include a sequence or fiagment of flie subject polypeptides. By substantially free is 
meant that less than about 90%, less than about 80%, less flian about 70%, less than 
abbut 60%, or less than about 50% of the composition is made \xp of materials otiier 
flian the isolated polypeptide. For example, the isolated polypeptide is at least about 
50%, at least about 60%, at least about 70%, at least about 80%, at least about 90%, at 
least about 95%, at least about 97%, or at least about 99% fi:ee of the materials with 
Mdiich it is associated in nature. For sample, an isolated polypeptide may be present 
in a composition wherein at least about 50%, at least about 60%, at least about 70%, 
at least about 80%, at least about 90%, at least about 95%, at least about 97%, or at , 
least about 99% of the total macromolecules (for example, polypeptides, fiagments 
ftereof, polynucleotides, fiagments thereof, Upids, polysaccharides, and 
oUgosaccharides) in the composition is the isolated polypeptide. Where at least about 
99% of die total macromolecules is the isolated polypeptide, flie polypeptide is at least 
about 99% pure, and the composition comprises less than about 1% contaminanL As 
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used herein, an "isolated," "purified," or "substantially isolated" polypeptide, or a 
polypeptide in "substantially pure form," in "substantially purified form," in 
"substantial purity," or as an "isolate," also refers to recombinant polypeptides, 
modified, tagged and fusion polypeptides, and chemically synthesized polypeptides, 
which by virtue or origm or manipulation, are not associated with all or a portion of 
the materials with which they are associated in nature, are linked to molecules other 
flian that to which they are linked in nature, or do not occur in nature. 

[0301] Detection methods of the invention can be qualitative or quantitative. 
Thus, as used herein, the terms "detection," "identification," "determinatiQn,'' and tihie 
like, refer to both qualitative and quantitative determinatioDS, and include 
"measuring." For example, detection methods include melhods for detecting the 
presence and/or level of polynucleotide or polypq>tide in a biological sample, and 
methods for detecting the presence and/or level of biological activity of 
polynucleotide or polypeptide in a sample. 

. [0302] As used herein, the tenn "array" or "miCToarray" may be used 
interchangeably and refers to a collection of plural biological molecules such as 
nucleic adds, polypeptides, or antibodies, having locatable addresses that may be 
separately detectable. Generally, "microanay" enconq)asses use of sub microgram 
quantities of biological molecules. The biological molecules may be affixed to a 
substrate or may be in solution or suspension. The substrate can be porous or solid, 
planar or non-planar, unitary or distributed, such as a glass slide, a 96 well plate, with 
or without the use of microbeads or nanobeads. As such, the term "microanay" 
includes all of the devices referred to as microarrays in Schena, 1999; Bassett et al., 
1999; BowteU, 1999; Brown and Botstein, 1999; Chakravarti, 1999; Oieung et al., 
1999; Cole et al., 1999; Collins, 1999; Debouck and GoodfeUow, 1999; Duggan et al., 
1999; Hacia, 1999; Lander, 1999; Lipshutz et al., 1999; Southern, et al., 1999; 
Schena, 2000; Brenner et al, 2000; Lander, 2001; Steinhaur et al., 2002; and Espejo et 
al, 2002. Nucleic acid microarrays include bofli oligonucleotide arrays (DNA chips) 
containing expressed sequaice tags ("ESTs") and arrays of larger DNA sequences 
representing a plurality of genes bound to the substrate, either one of which can be 
used for hybridization studies. Protein and antibody microarrays include arrays of 
polypeptides or proteios, including but not limited to, polypeptides or proteins 
obtained by purification, fiision proteins, and antibodies, and can be used for specific 
binding studies (Zhu and Snyder, 2003; Houseman et al., 2002; Schaeferling et al,. 
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2002; Weng et al., 2002; Winssinger et al., 2002; Zhu et al., 2001; Zhu et al. 2001; 
and MacBeafli and Schreiber, 2000). 

[0303] A "nucleic acid hybridization reaction" is one in which single strands 
of DNA or KNA randomly collide with one another, and bind to each odier only when 
their nucleotide sequences have some degree of complementarity. The solvent and 
temperature conditions can be varied in the reactions to modulate the extent to which 
the molecules can bind to one another. Hybridization reactions can be performed 
under different conditions of "stringency." The "stringency" of a hybridization 
reaction as used herein refers to the conditions (e.g., solvent and temperature 
conditions) under which two nucleic acid strands will either pair or fail to pair to form 
a •'hybrid" hehx. 

[0304] "Tm" is the temperature in degrees Celsius at which 50% of a 
polynucleotide duplex made of complementary strands of nucleic acids that are 
hydrogen bonded in an anti-parallel direction by Watson-Crick base pairing dissociate 
into single strands under conditions of the hybridization reaction. T^ can be predicted 
according to a standard formula, such as: Tm = 81.5 + 16.6 log[X^ + 0-41 (%G/C) - 
0.61 (%F) - 600/L, where pC*] is the cation concentration (usually sodium ion, Na^ in 
mol/L; (%G/C) is the number of G and C residues as a percentage of total residues in 
the duplex; (%F) is the percent formamide in solution (wtA^ol); and L is the number of 
nucleotides in each strand of the paired nucleic acids. 

[0305] A 'Tjuffer" is a system that tends to resist change in pH when a given 
incr^ent of hydrogen ion or hydroxide ion is added. BuflFered solutions contain 
conjugate acid-base pairs. Any conventional buffer can be used with the inventions 
herein including but riot liniited to, 
bicarbonate. 

[0306] A "library** of polynucleotides conqnises a collection of sequence 
information of a plurality of polynucleotide sequences, which information is provided 
in either biochemical form (e.g., as a collection of polynucleotide molecules), or in 
electronic form (e.g., as a collection of polynucleotide sequences stored in a 
computer-readable form, as in a computer-based system, a computer data file, and/or 
as part of a computer program). 

[0307] A "library" of polypeptides comprises a collection of sequence 
information of a plurality of polypeptide sequences, which informatioii is provided in, 
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e.g., a collection of polypeptide sequences stoied in a computer-readable form, as in a 
computer-based system, a computer data file, and/or as part of a computer program. 

[0308] "Media" refers to a manufecture, other than an isolated nucleic acid 
molecule, that contains the sequence information of the present invention. Such a 
manufacture provides the genome sequence or a subset thereof in a form that can be 
examined by means not directly applicable to the sequence as it exists in a nucleic 
acid, e.g., with computer-readable media comprising data storage structures. Such 
media include, but are not limited to: magnetic storage media, such as a floppy disc, a 
hard disc storage medium, and a magnetic tape; optical storage media such as CD- 
ROM; electrical storage media such as RAM and ROM; and hybrids of these 
categories such as magnetic/optical storage media. 

[0309] "Recorded" refers to a process for storing information on computer 
readable media, using any such methods as known in the art 

[0310] As used herein, "a computer-based system" refers to the hardware 
means, software means, and data stprage means used to analyze the nucleotide 
sequence information ofthe present invention. The minimum hardware of the 
computer-based systems of the present invention comprises a central processing unit 
(CPU), input means, output means, and data storage means. A skilled artisan can 
readily appreciate that any one ofthe currently available computer-based systems are 
suitable for use in the present invention. The data storage means can comprise any 
manufacture comprising a recording of the present sequence information as described 
^bove, or a memory access means that can access such a manufacture. 

[03 1 1] "Search means" refers to one or more programs implemented on the 
computer-based system, to compare a target sequence or target structural motif, or 
eiqpression levels of a polynucleotide in a sanq>le, with the stored sequence 
information. A variety of known algorittmis are publicly known and commercially 
available, e.g., MacPattem (EMBL), BLAST, BLASTN and BLASTX (NCBI), 
gapped BLAST, BLAZE, the Wise package, FASTX, Clustalw, FASTA, FASTA3, 
AlignO, TCoflFee, BestFit, FastDB, and TeraBLAST (TimeLogic, Crystal Bay, 
Nevada). Search means can be used to identify fiagments or regions ofthe genome 
that match a particular target sequence or target motif, for example, based on 
sequence similarity, for example, to identify open reading fiames (ORFs) within the 
genome that contain homology to ORFs from other organisms. . 
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[03 12] "Sequence similarity," "sequence homology," "homology," "sequence 
identity," and "percent sequence identity," used interchangeably herein, describe the 
degree of relatedness between two polynucleotide or polypeptide sequences. In 
general, "identity" means the exact match-up of two or more nucleotide sequences or 
two or more amino acid sequences, where the nucleotide or amino acids being 
compared are the same. Also, in general, "similarity" or *liomology" means the exact 
match-iqp of two or more nucleotide sequences or two or more amino acid sequences, 
where flie nucleotide or amino acids being compared are either the same or possess 
similar chemical and/or physical properties. The terms also refer to the percentage of 
the "aligned" bases (for the polynucleotides) or amino acid residues (for the 
polypeptides) that are identical whsa the sequences are aligned Sequences can be 
aligned in a number of different ways and sequence similarity can be determined in a 
number of different ways. For example, the bases or amino acid residues of one 
sequence can be aligned to a gap in the other sequence, or they can be aligned only to 
another base or amino acid residue in the aOxer sequence. A g^ can range anywhere 
fixmi one nucleotide, base, or aihino acid residue to multiple.exons in lengfli, up to 
any numbo: of nucleotides or amino acid residues. Further, sequences can be aligned 
such that nucleotides (or bases) align with nucleotides, nucleotides align with amino 
acid residue, or amino acid residues align with amino acid residues. 

. [03 13] A "target sequence" can be any polynucleotide or amino acid 
sequence of six or more contiguous nucleotides or two or more amino acids, for 
example, fix>m about 5 or fix)m about 10 to about 100 amino acids, or from about 15 
or fiom about 30 to about 300 nucleotides. A variety of comparing means can be 
used to accomplish comparison of sequence information from a sample (e.g., to 
analyze target sequences, target motifs, or relative e^ipression levels) with the data 
storage means. A skilled artisan can readily recognize that any one of the publicly 
available homology search programs can be used as the search means for the 
computer based systems of the present invention to accon4>lish comparison of target 
sequences and tiiotifi. Con^uter programs to analyze expression levels in a sample 
and in controls are also known in the art A "target sequence" includes an "antibody 
target sequence," which refers to an amino acid sequence that can be used as an 
immunogen for injection mto animals for production of antibodies or for screening 
against a phage display or antibody library for identification of binding partners. 
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[0314] A "target structural motif," or "target motif," refers to any rationally 
selected sequence or combination of sequences in which the sequence(s) are chosen 
based on a three-dimensional configuration that is formed upon ftie folding of the 
target motif, or on consensus sequences of regulatory or active sites. There are a 
variety of target motifs known in the art Protein target motife include, but are not 
limited to, enzyme active sites and signal sequences. Nucleic acid target motife 
include, but are not limited to, hairpin structures, promoter sequences, and other 
e;q)ression elements such as binding sites for transcription factors, 

[0315] The temi '^host c^ll" includes an individual cell, cell line, cell culture, 
or in vivo cell, which can be or has been a recipient of any polynucleotides or 
polypeptides of the invention, for example, a recombinant vector, an isolated 
polynucleotide, antibody or fosion protein. Host cells include progeny of a single 
host cell, and the progeny may not necessarily be completely identical (m 
morphology, physiology, or in total DNA, RNA, or polypeptide complement) to the 
original parent cell due to natural, accidental, or deliberate mutation and/or change. 
Host cells can be prokaiyotic or eukaryotic, including mammalian, insect, amphibian, 
rq)tile, crustacean, avian, fish, plant and fungal cells. A host cell includes cells 
transformed, transfected, transduced, or infected in vivo or in vitro with a 
polynucleotide of the invention, for example, a recombmant vector. A host cell which 
comprises a recombinant vector of the invention naay be called a "recombinant host 
ceU." 

[0316] The term "agonist" refers to a substance that mimics the function of 
an active molecule. Agonists include, but are not limited to, drugs, homiones, 
antibodies, and neuTotraiianitteK, as weU as aiialogues and ftagme^ 

[0317] The term "antagonist" refers to a molecule that competes for the 
binding sites of an agonist, but does not induce an active response. Antagonists 
include, but are not limited to, drugs, hormones, antibodies, and neurotransmitters, as 
well as analogues and fragments thereof. 

[0318] Theterai "recqrtor''referstoapolypq)tidethatbmdstoaspeci^^ 

extracellular molecule and may initiate a cellular response. 

[0319] Hxe tenn "ligand" refers to any molecule that binds to a specific site 
on another molecule. 

[0320] The terai "over-expressed" refers to a state ^erQin there exists any 
measurable increase over normal or baseline levels. For example, a molecule that is 
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over-expressed in a disorder is one that is manifest in a measurably higher level 

compared toleyels in the absmce of the disorder. 

Compositions 

[0321] The present invention provides novel isolated polynucleotides 
encoding polypeptides and fr^ments thereof. The present invention also provides 
novel isolated polypeptides, fragments fhereoj^ and compositions comprising same. 
The present invention fiirfher provides polynucleotide compositions that can be used 
to identify the polypeptides. 

[0322] The present invention provides recombinant vectors and host cells for 
use in gene expression, primer pairs for use in hybridizations, coiriputer-based 
enibodiments for use in bioinformatics, and transgenic animals and embryonic stem 
cell lines for use in mutating and regulating gene e7q)ression. 

Nucleic Adds 

Sequences 

[0323] This invention provides genes encoding proteins, the encoded 
proteins, and fiagments and homologs thereof It provides human polynucleotide 
sequences and the corresponding mouse polynucleotide sequences. 

[0324] The nucleic acids of the subject invention can encode all or a part of 
flie subject proteins. Double or single stranded fragments can be obtained fiom fiie 
DNA sequence by chemically synthesizing oligonucleotides in accordance ^th 
conventional methods, for example by restriction enzyme digestion or polymerase 
chain reaction (PCR) amplification The use of the polyiherase chain reaction has 
been described (Saiki et al., 1988) and ciirrent techniques have been reviewed 
(Sambiook et al., 1989; McPherson et al. 2000; Dieffenbach and Dveksler, 1995). 
For the most part, DNA fiagments will be of at least about S nucleotides, at least 
about 8 nucleotides, at least about 10 nucleotides, at least abotit IS nucleotides, at 
least about 18 nucleotides, at least about 20 nucleotides, at least about 2S nucleotides, 
at least about 30 nucleotides, or at least about SO nucleotides, at least about 75 
nucleotides, or at least about 100 nucleotides. Nucleic acid compositions that encode 
at least six contiguous amino acids (i.e., fragments of 18 nucleotides or more), for 
example, nucleic acid corcq)ositions encoding at least 8 contiguous amino acids (i.e., 
fragments of 24 nucleotides or more), are useiid in directing the expression or the 
synthesis of peptides that can be used as immunogens (Lemer, 1982; Shiimicik et al., 
1983; Sutchffe et al., 1983), 
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[0325] In some embodiments, a polynucleotide of the invention comprises a 
nucleotide sequence of at least about 5, at least about 8, at least about 10, at least 
about 15, at least about 1 8, at least about 20, at least about 25, at least about 30, at 
least about 50, at least about 75, at least about 100, at least about 150, at least about 
200, at least about 250, at least about 300, at least about 350, at least about 400, at 
least about 450, at least about 500, at least about 550, at least about 600, at least about 
650, at least about 700, at least about 750, at least about 800, at least about 850, at 
least about 900, at least about 950, at least about 1000, at least about 1 100, at least 
about 1200, at least about 1300, at least about 1400, at least about 1500, at least about 
1600, at least about 1700, at least about 1800, at least about 1900, at least about 2000, 
at least about 2100, at least about 2200, at least about 2300, at least about 2400, at 
least about 2500, at least aibout 3000, at least about 4000, or at least about 5000 
contiguous nucleotides of any one of the sequences shown in SEQ ID NOS.: 1 - 209 
and 419 - 627, or fhe coding region thereof, or a conq>lement thereof. 

[0326] In other embodiments, a polynucleotide of the invention has at least 
about 60%, 70%, at least about 75%, at least about 80%, at least about 85%, at least 
about 90%, at least about 95%, at least about 97%, at least about 98%, or at least 
about 99% nucleotide sequence identity with a nucleotide sequence, or a fragment 
thereof, of the coding region of any one of the sequences shown in SEQ ID NOS.: 1 - 
209 and 419 - 627, or a complement thereof. These sequence variants include 
naturally-occuiring variants (e.g., SNPs, aUeUc variants, and homologs ftom other 
species), degenerate variants, variants associated with disease or pathological states, 
and variants resulting fiom random or directed mutagenesis, as well as fit>m chemical, 
or o&er modificatiotL . 

[0327] In some ^bodiments, a polynucleotide of the invention comprises a 
nucleotide sequmce that encodes a polypeptide comprising an amino acid sequence of 
at least about 5 , at least about 8 , at least about 1 0, at least about 1 5, at least about 1 8, 
at least aibout 20, at least about 25, at least about 30, at least about 50, at least about 
75, at least shout 100, at least about 150, at least about 200, at least about 250, at least 
about 300, at least about 350, at least about 400, at least about 450, at least about 500, 
at least about 550, at least about 600, at least about 650, at least about 700, at least 
about 750, at least about 800, at least about 850, at least about 900, at least about 950, 
or at least about 1000 contiguous amino acids of at least one of the sequences showii 
in SEQ ED NOS.: 21 0-41 8 (e.g., a polypeptide encoded by at least one of the . 
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nucleotide sequences shown in SEQ ID NOS.: 1 - 209 and 419 - 627), up to and 
including an entire amino acid sequence as shown in SEQ ID NOS.: 210 - 41 8 (or as 
encoded by at least one of the nucleotide sequences shown in SEQ ID NOS.: 1 - 209 
and419-627). 

[0328] In some embodiment, flie present invention includes the present 
polynucleotide selected from SEQ ID NOS.: 1 - 209 and 419 - 627, which contain 300 
bp ofS'temiiaus of a protein encoding polynucleotide sequence. Such a 
polynucleotide is useful for the purposes of clustering gene sequences to determine 
gene family. 

[0329] In further embodiments, a polynucleotide of flie invention hybridizes 

under stringent hybridization conditions to a polynucleotide having the coding region 

of any one of the sequences shown in SEQ ID NOS.: 1 - 209 and 419 - 627, or a 

conpqplemrat thereof • 

[0330] The polynucleotides ofthe invention include those that encode 

variants ofthe polypeptide sequences encoded by the polynucleotides ofthe Sequence 
Listing. In some embodiments, these polynucleotides encode variant polypeptides 
that include insertions, additions, deletions, or substitutions compared with the 
polypeptides encoded by the nucleotide sequences shown in SEQ ID NOS;: 1 - 209 
and 419 - 627, and in Table 1. Conservative amino acid substitutions include 
serine/tiueonine, vaUne/leucine/isoleudne, asparagine/histidine/glutamine, glutamic 
add/aspartic acid, etc. (Gonnet et al., 1992). 

[033 1] The nucleic acids of flie invention include degenerate variants that 
can be translated, according to the standard genetic code, to provide an ammo acid 
sequence identical to that translated from the nucleic acid sequences herein. For 
example, synonymous oodons include GGG, GGA, GGC, and GGU, each encodmg 
Glycine. 

[0332] The nucleic acids ofthe invention mclude single nucleotide 
polymorphisms (SNPs), which occur frequentty in eukarybtic genomes (Lander, et al, 
2001), The nucleotide sequence determmed from one individual of a species can 
dififer fi:om other allelic forms present within flie population. 

[0333] The nucleic acids of flie invention include homologs ofthe 
polynucleotides. The source of homologous genes can be any species, e.g., primate 
species, particularly human; rodents, such as rats, hamsters, gumea pigs, and mice; 
rabbits, canines, felines; catties, such as bbvines, goats, pigs, sheep, equines. 
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crustaceans, birds, chickens, reptiles, anq)hibians, fish, insects, plants, fimgi, yeast, 
nematodes^ etc. Among manunalian species, e.g., hxmian and mouse, homologs have 
substantial sequmce similarity, e.g., at least about 60% sequence identity, at least 
about 75% sequence identity, or at least about 80% sequence identity among 
nucleotide sequences. Inmany embodiments of interest, homology will be at least 
about 75%, at least about 80% ,at least about 85%, at least about 90%, at least about 
95%, at least about 97%, or at least about 98%, where in certain embodiments of 
interest homology will be as hi^ as about 99%. 

[0334] Modifications in the native structure of nucleic acids, including 
alterations in the backbone, sugars or heterocyclic bases, have been shown to increase 
intracellular stability and binding afiSnity. Among usefiil changes in the backbone 
chemistry are phosphorothioates; phosphoroditfaioates, where both of the 
non-bridging oxygens are substituted with sulfur; phosphoroamidites; alkyl 
phosphotriesters and boranophosphates, Achiral phosphate derivatives include 
3».0'-5'-S-phosphorothioate, 3 -S-5 -O- phosphorothioate, 3 -CH2^5*-0-phosphonate 
and 3 -NH-5 -O-phosphoroamidate. Peptide nucleic acids replace the entire ribose 
phosphodiester backbone witii a peptide linkage. 

[0335] Sugar modifications are also used to enhance stability and afiBnity. 
The a-ahomer of deoxyribose can be used, where the base is inverted wifli respect to 
the natural p-anomer. The 2'-OH of the ribose sugar can be altered to form 2*-0- 
methyl or 2 -O-allyl sugars, which provides resistance to degradation without 
coniprising afBnity. 

[0336] Modification of the heterocyclic bases must mai nt ain proper base 
pairing. Some usefiil substitutions include deoxyuridine for deoxythymidine; 
5-methyl-2 - deoxycytidine and 5-bromo-2*-deoxycytidine for deoxycytidine. 5- 
propynyl-2 - deoxyuridine and 5-propynyl-2'-deoxycytidine have been shown to 
increase affinity and biological activity when substituted for deoxjrthymidine and 
deoxycytidine, respectively, 

[0337] A genomic sequence ofinterest comprises the nucleic acid present 
between die initiation codon and the stop codon, as defined in the listed sequences, 
including all of the intions that are normally present in a native chromosome. It can 
fiirflierincliuie the 3 'and 5 'untranslated regions found in the mature mK^ It can 
fiirther include specific transcriptional and translational regulatory sequences, such as 
promoters, enhancers, etc., including about 1 H), about 2 kb, and possibly more, of 
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flanking genonucDNA at either fheS' or 3 of tbetii^ The 
genomic DNA can be isolated as a Scagpiwt of 100 kbp or smaller, and sid)stantially . 
free of flanking chromosomal sequence. The genomic DNA flanking flie coding 
region, either 3 ' or 5 *, or internal regulatory sequences as sometimes found in introns, 
contains sequences required for proper tissue and stage specific e^qjression. 

[0338] Nucleic acid molecules of the invention can comprise heterologous 
nucleic acid molecules, i.e., nucleic acid molecules other than the subject nucleic acid 
molecules, of any length. For example, the subject nucleic acid molecules can be 
flanked on the 5 ' and/or 3 ' ends by heterologous nucleic acid molecules of from about 
1 nucleotide to about 10 nucleotides, from about 10 nucleotides to about 20 
nucleotides, from about 20 nucleotides to about 50 nucleotides, from about 50 
nucleotides to about 100 nucleotides, fix>m about 100 nucleotides to about 250 
nucleotides, fix>m about 250 nucleotides to about 500 nucleotides, or fix>m about 500 
nucleotides to about 1000 nucleotides, or more in length 

[0339] The subject polynucleotides include those fliat encode fusion proteins 
comprising the subject polypeptides fused to "fusion partners." For example, the 
present soluble receptor or ligand can be fused to an immunoglobulin fragment, such 
as an Fc fi:agment for stability in circulation or to fix complement Other polypeptide 
fiiagments that have equivalent cs^abilities as the Fc fi:agm^ts can also be used 
herein. 

[0340] The isolated nucleic acids of the invention can be used as probes to 
detect and characterize gross alteration in a genomic locus, sudi as deletions, 
insertions, translocations, and diq>lications, e.g., flying fluorescence in situ 
hybridization (FISH) techniques to examine chromosome spreads (Andreefif et al., 
1999). The nucleic acids are also usefiil for detecting smaller genomic alterations, 
such as deletions, insertions, additions, translocations, and substitutions (e.g., SNPs). 

[0341] When used as probes to detect nucleic acid molecules cc^able of 
hybridizing with nucleic acids described in the Sequrace Listing, the nucleic acid 
molecules can be flanked by heterologous sequences of any length. When used as 
probeis, a subject nucleic acid can include nucleotide analogs that incorporate labels 
that are direcfly detectable, such as radiolabels or fluorophores, or nucleotide analogs 
that incorporate labels that can be visualized in a subsequent reaction, such as biotin 
or various haptens. H^tens that are commonly conjugated to nucleotides for 
subsequent labeling include biotin, digoxigenin, and dinitrophenyl. 
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[0342] Sdtable fluorescent labels include fluorodro * 
and its derivatives, e.g., fluorescein isothiocyanate (FITC6-caiboxyfluorescein (6- 
FAM), 2;7'-dimethoxy-4',5'-dicUoro-6-caiboxyfluore5cein (JOE), ), 6-caiboxy- 
2;4',7;4,7-hexachlorofluorescein (HEX), 5-caiboxyfluorescein (5-FAM); coumarin 
and its derivatives, e.g., 7-amino-4-methylcoumarin, aminocoumarin; bodipy dyes, 
such as Bodipy PL; cascade blue; Oregon green; rhodamine dyes, e.g., Aodamine, 6- 
caiboxy-X-rhodamine (ROX), Texas red, phycoerythrin, and tetramethylrhodamine; 
eosins and erythrosins; cyanine dyes, e.g., allophycocyanin, Cy3 and Cy5 or 
N,N,N',N -tetranie%l-6-<:aiboxyrhodamine (TAM macrocyclic chelates of 
lanthanide ions, e.g., quantum dye, etc; and cbemiluninescent molecules, e.g., 
luciferases. 

[0343] Fluorescent labels also include a green fluorescent protein (GFP), Le., 
a "humanized" version of a GFP, e.g., wherein codons of the naturally-occurring 
nucleotide sequraice are changed to more closely match human codon bias; a GFP . 
derived ftom Aequoria victoria or a derivative thereof, e,g*, a "humanized" derivative 
such as Enhanced GFP, which are available commereially, e.g., ftom Clontech, Inc.; 
othCT fluorescent mutants of a GFP &omAequoria victoria, e-g., as described in U.S. 
Patent No. 6,066,476; 6,020,192; 5,985,577;. 5^76,796; 5,968,750; 5,968,738; 
5,958,713; 5,919,445; 5,874,304; a GFP from another species such as Renilla 
reniformis, Renilla mulleri, or Ptilpsarcus guernyi, as previously described (WO 
99/49019; Peelle et al., 2001), "humanized" recombinant GFP OirGFP) (Stiatagene®); 
any of a variety of fluorescent and colored proteins ftom Anfhozoan species, (e«g.. 
Mate etal., 1999). 

[0344] Probes can aliso contain fluorescent aiialogs, including cominerciaUy 
available fluorescent nucleotide analogs that can readily be incorporated into a subject 
nucleic acid. These include deoxyribonucleotides and/or ribonucleotide analogs 
labeled with Cy3, Cy5, Texas Red, Alexa Fluor dyes, rhodamine, cascade blue, or 
BODIPY, and the like. 

[0345] , Suitable radioactive labels include, e.g.,^^P,/^S, or For 
example, probes can contain radiolabeled analogs, including those commonly labeled 
with ^^P or ^% such as a-^^P^TP, -dTTP, -dCTP, and dGTP; y-^^S-GTP and a-^^S- 
dATP, and the like. 

[0346] Nucleic adds of the invention can also be bound to a substrate. 
Subject nucleic acids can be attached covalentiy, attached to a surface of the siq)port 
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or applied to a derivatized surfiice in a chaotropic agent that facilitates denaturation 
and adherence, e.g., by noncovalent interactions, or some combination thereof. The 
nucleic acids can be bound to a substrate to which a plurality of other nucleic adds 
are concurrently bound, hybridiration to each of flie plurality of the bound nucleic 
acids being separately detectable. 

[0347] The substrate can be porous or sphd, planar or non-planar, unitary or 
distributed; and the bond between the nucleic acid and the substrate can be covalent or 
non-covalent The substrate can be in the form of microbeads or nanobeads. 
Substrates include, but are not Umited to, a membrane, such as nitrocellulose, nylon, 
positively-charged derivati2»d nylon; a solid substrate such as glass, amorphous 
silicon, crystalline silicon, plastics (including e.g., polymethylacrylic, polyethylene, 
polypropylene, polyacrylatei polymethylmethacrylate, polyvinylchloridci 
polytetrafluoioethylene, polystyrene, polycarbonate, polyacetal, polysulfone, cellulose 
acetate, or mixtures thereof). 

[0348] The subject nucleic acids include antisense BNA^ ribozymes, and 
RNAi. Further, The nucleic acids of the invention can be used for antisense or RNAi 
inhibition of transcription or translation using methods known in the art (Phillips, 
1999a; Phillips, 1999b; Hartmann et al., 1999; Steui et al., 1998; Agrawal et al., 
1998). 

Expression Vectors 

[0349] The instant invention further provides host oeUs,e:g.,re^ 
host cells, that comprise a subject nucleic acid, host cells flmt comprise a recombinant 
vector, and host cells that secrete antibodies of the invention. Subject host cells can 
be cultured in vitro, or can be part of a multicellular organism. Host cells are 
described in more detail below. The instant invention further provides transgenic 
plants and non-huinan anunals, as described m more detail below. 

[0350] In addition to the plurahty ofuses described in greatear detail in 
following sections, the subject nucleic acids find use in the preparation of all or a 
portion of the polypeptides of the subject invention, as described above, using an 
expression system. For eT5)ression, an expression vectpr can be employed The 
expression vector will provide a transcriptional and translational initiation region, 
which may be inducible, conditionally-active, or constitutive, or tissue-specific, where 
the coding region is operably linked under the transcriptional control of the 
transcriptional iintiation region, and a transcriptional and transktional terrain 
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region. These control regions can be native to a gene encoding the subject peptides, 
or can be derived from heterologous or exogenous sources. 

[035 1] The subject nucleic acids can also be provided as part of a vector 
(e.g., a polynucleotide construct comprising an ejqpression cassette), a wide variety of 
which are known in the art Vectors include, but are not limited to, plasmids; 
cosmids; viral vectors; human, yeast, bacterial, Plrderived artificial chromosomes 
CEIACIs, YACs, BACIs, PACs, etc.), nMni-K^hromosomes, and the like. Vectors are 
amply described in numerous publications well known to those in the art ( AusubeU et 
al.; Jones et al., 1998a; Jones et al., 1998b). Vectors can provide for nucleic acid 
e3q)ression, for nucleic acid propagation, or both. 

[03S2] A recombinant vector or construct that includes a nucleic acid of the 
invention is useful for propagatiQg a nucleic add in a host cell; such vectors are 
known as "cloning vectors." Vectors can transfer nucleic acid betwera host cells 
derived fiom disparate organisms; fliese are known in the art as "shuttle vectors." 
Vectors can also insert a subject nucleic acid into a host ceills chromosome; these are 
known in the art as "insertion vectors." Vectors can e?q)ress either sense or antisense 
RNA transcripts of the uivention in vitro (e.g., in a cell-fiee system or within an in 
vitro cultured host cell) or in vivo (e.g., in a multicellular plant or animal); these are 
known in flxe art as "e3q)ression vectors," which can be part of an expression system. 
Expression vectors can also produce a subject antibody. 
Vectors typically include at least one origin of replicatibn, at least one site for 
insertion of het^ologous nucleic add (e.g., in the fomi of a polylinker with multiple, 
tightly clust^:ed, single cutting restriction endonuclease recognition sites), and at least 
one selectable marker, although some integrative vectors will lack an origin that is 
functional in the host to be chromosomally modified, and some vectors will liack 
selectable markers. Vectors are transiently or stably be maintained in the cells, 
usually for a period of at least about one day, at least about several days to at least 
about several weeks. 

[0353] Prior to vector insCTtion, the DNA of interest will be obtained 
substantially fiee of other nucleic add sequences. The DNA can be "recombhsant," 
and flanked by one or more nucleotides with which it is not normally associated on a 
naturally occurring chromosome. 

[0354] E^qpression vectors generally have convenient restriction sites located 
near the promoter sequence to provide for the insertion of nucleic acid sequences 
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encoding heterologous protein or RNA molecules. A selectable marker operative in 
the expression system or host can be present Expression vectors can be used for the 
production of fusion proteins, where flie fusion peptide provides additional 
functionality, i.e., increased protein synthesis, a leader sequence for secretion, 
stability, reactivity. with defined andsera, or an enzyme marker, e.g., p-galactosidase. 

[0355] Promoters of the invention can be naturaUy contiguous or not 
naturally, contiguous to the expressed nucleic add molecule. The promoters can be 
inducible, conditionally active (such as the cre-lox promoter), constitutive, and/or 
tissue specific. 

[0356] Expression vectors can be prepared comprising a transcription 
cassette comprising a transcrq)tion initiation region, &e gene or fiagment thereof, and 
a transcriptional termination region. Ofparticular interest is the use of DNA 
sequences that allow for the e^qyression of functional epitopes or domains, at least 
about 5, at least about 8, at least about 10, at leiast about 15, at least about 18, at least 
about 20, at least about 25, at least about 30, at least about 50, at least about 75, at 
least about 100, at least about 150, at least about 200, at least about 250, at least about 
300, at least about 350, at least about 400, at least about 450, at least about 500, at 
least about 550, at least about 600, at least about 650, at least about 700, at least about 
750, at least about 800, at least about 850, at least about 900, at least about 950, or at 
least about 1000 amino acids in leng&, or any of the above-described fiagments, up to 
and including the complete open reading fi:ame of the gene. After introduction of 
these DNA sequences, the cells containing the vector construct can be selected by 
means of a selectable marker, and the selected cells expanded and used as expression- 
competent host cells. 

[0357] Host cells can comprise prokaryotes or eukaiyotes that express 
proteins and polypeptides in accordance with conventional methods, the method 
depending on the purpose for expression. For large scale production of the protein, a 
unicellular organism, such as JS. coli, B. subtilis, S. cerevisiae^ insect cells in 
combination with baculovirus vectors, or cells of a higher organism such as 
vertebrates, particularly mammals, e.g., COS 7 cells, can be used as the expression 
host cells. In some situations, it is desirable to express eukaryotic genes in eukaryotic 
cells, where the encoded protein will benefit ftom native folding and post- 
translational modifications. 
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[0358] Specific expression systems of interest include plants, bacteria, yeast, 
insect cells, and mammalian cell-derived e;q)ression systems. Representative systems 
fix>m each of these categories axe provided below. 

[0359] Egression systems in plants include those described in U,S. Patent 
No. 6,096,546 and U.S. Patent No. 6,127,145, 

[0360] Expression systems in bacteria include those described by Chang et 
al., 1978; Goeddel et al., 1979; Goeddel et al., 1980; EP 0 036,776; U.S. Patent No- 
4,551,433; DeBoer et al., 1983); and Siebralist et al., 1980. 

[0361] Es^ression systems in yeast indude those described by EQnneiietaL, 
1978; Ito et al., 1983; Kmtz et al., 1986; Kunze et al., 1985; Gleeson et al., 1986; 
Roggenkamp et al., 1986; Das et al., 1984; De Louveiicotirt et al., 1983; Van den 
Beig et aL, 1990; Kunze et al., 1985; Gregg et aL, 1985; U.S. Patent Nos. 4,837,148 
and 4,929,555; Beach and Nurse, 1981; Davidow et.al., 1985; Gaillardin et al,, 1985; 
Ballance et al., 1983; Tilbum et al., 1983; Yelton et al., 1984; Kelly andHynes, 1985; 
EP 0 244,234; WO 91/00357; and U.S. Patent No. 6,080,559. 

[0362] Expression systems for heterologous genes in insects include those 
described in U.S. Patent No. 4,745,051; Friesen et al., 1986; EP 0 127,839; EP 0 
155,476; Vlak et al., 1988; Miller et al., 1988; Caibonell et al., 1988; Maeda et al., 
1985; Lebacq-Verheyden et al., 1988; Smith et al., 1985); Miyajuna et al,, 1987; and 
Martm et al., 1988. Numerous baculoviral strains and variants and corresponding 
permissive insect host cells are described in Luckow et al., 1988, MiUct et al., 1986, 
and Maeda et al., 1985. The insect cell e?cpression system is useful not only for 
production of heterologous proteins intracellularly, but can be used for e7q)ression of 
transmembrane proteiiis on the insept cell surfaces. Such insect cells can be used as 
immunogen for production of antibodies, for example, by injection of the insect cells 
into mice or rabbits or other suitable animals, for production of antibodies. 

[0363] MaminaUan expression systenis include those described in Dijkenia 
etal., 1985; Gorman etal., 1982;Boshartetal., 1985; and U.S. Patent No. 4,399,216. 
Additional features of mammalian e:q>ression are fecilitated as described in Ham and 
Wallace, 1979; Barnes and Sato, 1980 U.S. Patent Nds. 4,767,704, 4,657,866, 
4,927,762, 4,560,655, WO 90/103430, WO 87/00195, and U.S. RE 30,985. 
Mammalian cell e?q}ression systems can also be used for production of antibodies. 

[0364] The present polynucleotides can also be used in oell-ffee expression 
systems such as bacterial system, e.g., E. coli lysate, rabbit reticulocyte lysate system. 



103 



wo 2004/020595 



PCT/US2003/027107 



wheat genn extract system, frog oocyte lysate system, and the like which is 
conventional in the art See, for exanq)le, WO 00/68412, WO 01/27260, WO 
02/24939, WO 02/38790, WO 91/02076, and WO 91/02075. 

[0365] When any of the above-referenced host cells, or other appropriate 
host cells or organisms, are used to repUcate and/or e5q)ress the polynucleotides of (he 
invention, the resulting rq)licated nucleic acid, KNA, expressed protein or 
polypq)tide, is within the scope of the invention as. a product of the host cell or 
oiganism. 

[0366] Once flie gene corresponding to a selected polynucleotide is 
identified, its expression can be regulated in the gene^ native cell types. For ^cample, 
an endog^ous gene of a cell can be regulated by an exogenous regulatory sequence 
inserted into the genome of the cell at a location that will enhance or reduce 
ejqttessionofthe gene corresponding to the subject polypeptide. The regulatory 
sequence can be designed to integrate into the geiiome via homologous 
recombmation, as disclosed in U.S. Patent Nos. 5,641,670 and 5,733,761, the , 
disclosures ofwhich are herein incorporated by refermce. Alternatively, it can be 
designed to integrate into the genome via non-homologous recombination, as 
described in Wip 99/15650, the disclosure of which also herem incorporated by 
reference. Also encon^assed in the subject invention is tiie production of proteins 
without manipulating the encoding nucleic acid itself, but rather by mtegratmg a 
regulatory sequence into.the genome of a cell that abeady includes a gene that 
encodes the protein of interest; this production method is described in the' above* 
incorporated patent documents. 

Isolated Primer Pairs 

[0367] In some embodiments, the invention provides isolated nucleic acids 
that, when used as primers in a polymerase chain reaction, an:q>lify a subject 
polynucleotide, or a polynucleotide containing a subject polynucleotide. The 
amplified polynucleotide is from about 20 to about 50, fixnn about 50 to about 75, 
fixmi about 75 to about 100, fix)m about 100 to about 125, fix>m about 125 to about 
150, &6m about 150 to about 175, fix)m about 175 to about 200, fi:om about 200 to 
about 250, fcom about 250 to about 300, fiom about 300 to about 350, fix>m about 350 
to about 40Q, from about 400 to about 500, &(m about 500 to about 600, fixmi about 
600 to about 700, from about 700 to about 800, fiom about 800 to about 900, fix)m 
about 900 to about 1000, from about 1000 to about 2000, from about 2000 to about 
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3000, fipm about 3000 to about 4000, from about 4000 to about 5000, or from about 
5000 to about 6000 nucleotides or more in length- 

[0368] The isolated nucleic acids themselves are from about 10 to about 20, 
fit)m about 20 to about 30, from about 30 to about 40, from about 40 to about 50, 
from about 50 to about 1 00, or from about 100 to about 200 nucleotides in length. 
Generally, the nucleic acids are used in pairs in a poi3mi^:ase chain reaction, where 
they are referred to as "forward" and "reverse" primers. 

[0369] Thus, in some embodiments, the invention provides a pair of isolated 
nucleic acid molecules, each from about 10 to about 200 nucleotides in length, the 
first nticleic acid molecule of the pair comprising a sequence of at least 10 contiguous 
nucleotides having 100% sequence identity to a nucleic add sequ^ice as shown in 
SEQ ID NOS.: 1 - 209 and 419 - 627 and the second nucleic add molecule of the pair 
conqprising a sequence of at least 10 coniiguous nucleotides having 100% sequence 
identity to the reverse con^lement of the nucldc add sequence shown in SEQ ID 
NOS,: 1 - 209 and 419 - 627 , wherem the sequence of the second nucldc acid 
molecule is located 3 ' of the nucldc acid sequence of the first nucleic add molecule 
shown in SEQ ID NOS.: 1 - 209 and 419 - 627. The prima: nucleic acids are prepared 
using any jknown method, e.g., automated synthesis, and can be chosen to specifically 
amplify a cDNA copy of an mRNA encoding a subject polypeptide. 

[0370] In some embodiments, the first and/or the second nucldc add 
molecules comprise a detectable label. Tlie label can be a radioactive molecule, 
fluorescent molecule or another molecule, e.g., hapten, as described in detail above. 
Further, the label can be a two stage system, where the an:]{>iified DNA is conjugated 
to another molecule, i.e., biotin, digoxin, or a hapten, that has a high afBnity binding 
partner, ie., avidin, antidigoxin, or a specific antibody, respectively, and the binding 
partner conjugated to a detectable label. The label can be conjugated to one or both of 
theprim^. Alternatively, the pool of nucleotides used in the amplification is 
labeled, so as to incorporate the labd into the amplification product 

[0371] Conditions ttiat increase stringency of both DNA/DNA and 
DNA/RNA hybridization reactions are widely known and published in the art See, 
for example, Sambrook, 1989, and examples provided above. Examples of relevant 
conditions include (in order of increasing stringency): incubation temperatures of 
25'*C, 37**C, 50°C, and 68^C; buffer concentrations of 10 x SSC, 6.x SSC, 1 x SSC, 0.1 
X SSC (where 1 x SSC is 0.15 M NaCl and 15 mM citratei buffer); and their 
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equivalents using other buffer systems; fonnamide concentrations of 0%, 25%, 50%, 
and 75%; incubation times ftbm 5 minutes to 24 hours; 1, 2, or more washing steps; 
wash incubation times of 1, 2, or 15 minutes; and wash solutions of 6 x SSC, 1 x SSC, 
0. 1 X SSC, or deionized water. 

[0372] For example, 'Tiigh stringency conditions" include hybridization in 
50% fonnamide, 5X SSC, 0.2 fig/pl poly(dA), 0^ jig/pl human cotl DNA,.and 0.5% 
SDS, in a humid oven at 42^C overnight, followed by successive washes in IX SSC, 
0.2% SDS at 55*'C for 5 minutes, followed by washing at O.IX SSC, 0.2% SDS at 
5S^C for 20 minutes. Further exBix^>les of high stritigency omditions include 
hybridization at 50°C and O.l^SSC (15 mM sodium chloride/1.5 mM sodium citrate); 
overnight incubation at 42^C in a solution containing 50% formamide, 1 x SSC (150 
mMNaQ, 15 mM sodium citrate), 50 mM sodium phosphate 7.6), 5 x 
Denhardf s solution, 10% dextran sulfate, and 20 (igAnl denatured, beared sabnon 
sperm DNA, followed by washing the filters in 0«1 x SSC at about 65^ High 
stringency conditions also include aqueous hybridization (e.g., fiee of fonnamide) in 
6X SSC (whete 20X SSC contains 3.0 M NaCl and 0.3 M sodium citrate), 1% sodium 
dodecyl sul&te. (SDS) at 65^ for about 8 hours (or more), followed by one or more 
washes in 0.2 X SSC, 0.1% SDS at 65''C. Highly stringent hybridization conditions 
are hybridization conditions that axe at least as stringent as any one of the above 
representative conditions. Ofh^ stringent hybridization conditions are known in the 
art and can also be employed to identify nucleic adds of this particular embodiment 
of the invention. 

[0373] Conditions of "reduced stringency," suitable for hybridization to 
molecules encoding stmcturally andiimctionally related proteins, or otherwise 
serving related or associated functions, are the same as those for high stringency 
conditions but with a reduction in tenq>erature for hybridization and washing to lower 
temperatures (e.g., room temperature or about 22**C to 25*C). For example, moderate 
stringency conditions include aqueous hybridization (e.g., free of formamide) in 6X 
SSC, 1% SDS at 65°C for about 8 hours (or more), followed by one or more washes in 
2X SSC, 0.1% SDS at room temperature. Low stringency conditions include, for 
example, aqueous hybridization at SO^'C and 6xSSC (0.9 M sodium chloride/0.09 M 
sodium citrate) and washing at 25*'C in IxSSC (0.15 M sodium chloride/0.015 M 
sodium citrate). 
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[0374] The specificity of a hybridization leaction allows any single-stranded 
sequence of nucleotides to be labeled with a radioisotope or chemical and used as a 
probe to find a complementary strand, even in a cell or cell extract that contains 
millions of different DNA and RNA sequences. Probes of this type are widely used to 
detect the nucleic acids corresponding to specific genes, both to fiicilitate the 
purification and charact^ization of the genes a&cr cell lysis and to localize them in 
cells, tissues, and organisms, 

[03 75] Moreover, by carrying out hybridization reactions under conditions of 
"reduced strmgency," a probe prepared fixmi one gene can be used to find 
homologous evolutionary relatives - both in the same organism, where the relatives 
form part of a gene family, and in other organisms, where tiie evolutionary history of 
the nucleotide sequence can be traced A person skilled in the art would recognize 
how to modify the conditions to achieve the requisite degree of stringency for a 
particular hybridization. 

libraries 

[0376] The polynucleotide libraries of the invention generaUycon^irise a 
collection of sequence information of a plurality of polynucleotide sequences, where 
at least one of the polynucleotides has a sequence shown in SEQ ID NOS.: 1 -209 and 
419 - 627. By plurality is meant at least 2, at least 3, or at least all of the sequences in 
the Sequence Listing. The information may be provided in either biochemical form 
(e.g., as a collection of polynucleotide molecules), or in electronic form (e.g., as a 
collection of polynucleotide sequences stored in a computer-readable form, as in a 
computer-based system, a computer data file, and/or as a part of a computer program). 
The length and number of polynucleotides in the library will vary with the nature of 
the library, e.g., if the library is an oligonucleotide array, a cDNA array, or a 
computer database of the sequence information. 

[0377] The sequence information contained in either a biochemical or an 
electronic library of polynucleotides can be used in a variety of ways, e.g., as a 
resource for gene discovery, as a representation of sequences expressed in a selected 
oeU type (e.g., cell type markers), or as markers of a givra. disorder or disease state. 
In general, a disease marker is a representation of a gene product .tfaat is present in all 
cells affected by disease either at an mcreased or decreased level relative to a normal 
cell (e.g., a cell of the same or sunilar type that is not substantially affected by 
disease). For example, a polynucleotide sequence in a library can be a polynucleotide 
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that represents an mBNA, polypeptide, or other gene product encoded by the 
polynucleotide, that is either over-e3q)ressed or under-expressed in one cell compared 
to another (e.g., a first cell type compared to a second cell type; a normal cell 
compared to a diseased cell; a cell not e>q)osed to a signal or stimulus coippared to a 
cell exposed to that signal or stimulus; and the like). 

[0378] The nucleotide sequence information of die library can be embodied 
in any svutable form, e.g., electronic or biochemical forms. For example, a library of 
sequence information embodied in electronic fonn comprises an accessible computer 
data file that may contain the representative nucleotide sequences of genes that are 
differentially e^ressed (e,g., over-expressed or und^-esqpressed) as between, e.g., a 
first cell type compared to a second cell type (e.g., expression in a brain cell compared 
to expression in a kidney cell); a normal cell compared to a diseased cell (e.g., a non- 
cancerous cell compared to a cancerous ceU); a cell not ejqK>sed to an internal or 
external signal or stimulus compared to a cell e7q)osed to that signal or stimulus (e,g., 
a cell contacted with a ligand compared to a control cell.not contacted with the 
ligand); and the like. Other combinations and comparisons of cells will be readily 
apparent to the ordinarily skilled artisan. Biochemical embodiments of the library 
include a collection of nucleic acid molecules fliat have the sequences of the genes in 
&e library, where the nucleic acids can correq)ond to the entire gene in the library or 
to a fiiagment thereof, as described in greater detail below. 

[0379] Where the library is an electronic library, the nucleic acid sequence 
information can be present in a variety of media. For exaiqple, flie nucleic acid 
sequences of any of the polynucleotides shown in SEQ ID NOS.: 1 -209 and 419 - 
627 can be recorded on computer readable media of a computer-based system, e.g., 
anymediumthatcanbereadandaccesseddirectly by a computer. One of skill in the 
art can readily appreciate how any of the presently known computer readable 
mediums can be used to create a manufacture comprising a recording of the present 
sequence infonnation. Any convenient data storage structure can be chosen, based on 
the means used to access the stored information. Avariety of data processor 
programs and formats can be used for storage, e.g., word processing text file, database 
format, etc. In addition to the sequence information, electronic versions of the 
libraries of the invention can be provided in conjunction or connection with other 
computer-readable infonnation and/pr other types of computer-based files (e.g.. 
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sedichable files, executable files, etc, includmg, but not limited to, for example, 
search program software, etc.). 

[0380] By providing the nucleotide sequence in computer readable form in a 
computer-based system, flie information can be accessed for a variety of purposes. 
Computer software to access sequence information is publicly available. 
Conventional bioinfonnatics tools can be utilized to analyze sequences to determine 
sequence identity, sequence similarity, and gap information. For example, the gapped 
BLAST (Altschul et al., 1990, Altschul et al-, 1997), and BLAZE (Brutlag et al., 
1993) search algorithms on a Sybase system, or the TeraBLAST (TimeLogic, Crystal 
Bay, Nevada) program optionally running on a specialized computer platform 
available fiom TimeLogic, can be used to identify open reading firames (ORFs) within 
the genome that contain homology to ORFs &om other organisms. Homology 
between sequences of interest can be determined using the local homology algorithm 
of Smith and Waterman, 1981, as well as the BestFit program (Rechid et al,, 1989), 
and the FastDB algorithm (FastDB, 1988; described in Current Methods in Sequence 
Comparison and Analysis, Macromolecule Sequencing and Synthesis, Selected 
Methods and Applications, pp. 127-149, 1988, Alan IL Liss, Inc). 

[0381] Alignment programs that permit gaps in the sequence include 
Clustalw (Thompson et al., 1994), FASTA3 (Pearson, 2000) AlignO (Myera and 
Miller, 1988), and TCofFee (Notredame et al., 2000). Other methods for comparing 
and aligning nucleotide and protein sequences include, for example, BLASTX 
(NCBI), the Wise package (Bimey and Duibin, 2000), and FASTX (Pearson, 2000). 
These algorifluns determine sequence homology between nucleotide and protein 
sequences without translating the nucleotide sequences into protein sequences. Other 
techniques for alignment are also known in the art (Doolittle, et al,, 1996; BLAST, 
available firom the National Center for Biotechnology Information; FASTA, available 
in the Genetics Computing Group (GCG) package, fit>m Madison, Wisconsin, USA, a 
wholly owned subsidiaiy of Oxford Molecular Group, Inc.; Schlessmger, 1988a; 
Schlessinger, 1988b; and Needleman and Wunch, 1970). 

[0382] Sequence similarity is calculated based on a reference sequence, 
which may be a subset of a larger segu^ce, such as a conserved motif, coding region, 
flanking region, etc. The reference sequence is usually at least about 18 nt long, at 
least about 30 nt long, or may extend to the conq)lete sequence that is being 
compared. 
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[0383] One parameter for determming percent sequence identity is the 
percentage of the aUgnment in the region of strongest alignment between a target and 
a query sequence. Methods for detennining this percentage involve, for example, 
counting the number of ahgned bases of a query sequence in the region of strongest 
aligmnent and dividing this nmnber by the total number of bases in the region- For 
example, 1 0 matches divided by 11 total residues gives a percent sequence identity of 
approximately 90.9%, The length of the aligned region is typically at least about 
55%, at least about 58%, or at least about 60% of the total sequence length, and can 
be as great as about 62%, as great as about 64%, and even as great as about 66% of 
the total sequence length. 

[0384] The present invention includes human and mouse polynucleotide and 
polypeptide sequences that are at least about 95%, at least about 96%, at least about 
97%, at least about 98%, or at least about 99% homologous to the sequences in the 
Sequence Listing, based on using the method of det^mining sequence identity with 
the ins^on of gaps to detect the maximum degree of sequence identity. In oth^ 
embodiments of interest, homology will be at least about 80%, at least about 85%, or 
as hi^ as about 90%. 

[0385] A variety of structural formats for the iiq)ut and ouQ>utmean^ 
used to input and output flie information in the computer-based systems of the pr^ent 
invention; One format for an ou^ut means ranks the relative expression levels of 
different polynucleotides. Such presentation provides a skilled artisan with a ranking 
of relative expression levels to determine a gene expression profile. 

[0386] As discussed above, the Ubraty of the invention also encompasses 
biochemical libraries of the polynucleotides shown in SBQ ID NOS.: 1 - 209.and 419 
- 627, e.g., collections of nucleic acids representing the provided polynucleotides. 
The biochemical libraries can take a variety of forihs, e.g., a solution of cDNAs, a 
pattern of probe nucleic adds stably associated with a surface of a solid support (i.e., 
an array) and the like. Of particular interest are nucleic acid arrays in which one or 
more of the polynucleotide sequences shown in SEQ ID NOS.: 1 - 209 and 419 - 627 
is represmted on the array. A variety of difTerent array formats have been developed 
and are known to ttiose of skill in the art. The arrays of the subject invention find use 
in a variety of applications, including gene egression analysis, drug screening, 
mutation analysis, and the like, as disclosed in the herein-listed exemplaiy patent 
docummts. 
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[03 87] In addition to the above nucleic acid libraries, analogous libraries of 
polypeptides are also provided, where the polypeptides of the library will represent at 
least a portion of the polypeptides encoded by a gene corresponding to one or more of 
the sequences shown in SEQ ID NOS.: 1 -209 and 419 - 627. 

[03 88] Further, analogous libraries of antibodies are also provided, wh^e the 
litearies comprise antibodies or fragments thereof that specifically bind to at least a 
portion of at least one of the subject polypeptides. Further, antibody libraries may 
conq)rise antibodies or fragments thereof that specifically inhibit binding of a subject 
polypeptide to its ligand or substrate, or that specifically inhibit binding of a subject 
polypeptide as a substrate to another molecule. Moreover, corresponding nucleic acid 
libraries are abo provided, comprising polynucleotide sequences that encode the 
antibodies or antibody fi:agments described above. 

Polypeptides 

Peptides and Modified Peptides 

[0389] In some embodiments of tiie present invention, the active agent is a 
peptide. Suitable peptides include peptides of fix)m about 3 amino acids to about 50, 
from about 5 to about 30, or from about 10 to about 25 amino acids in length. In 
some embodiments, a peptide has a sequence of from about 3 ammo acids to about 
50, fix)m about 5 to about 30, or from about 10 to about 25 amino acids of 
corresponding natra:ally-occurring protein. In some embodunents, a peptide exhibits 
one or more of the following activities: inhibits binding of a subject polypeptide to an 
interacting protein or other molecule; inhibits subject polypeptide binding to a second 
polypeptide molecule; inhibits a signal transduction activity of a subject polypeptide; 
mhibits an enzymatic activity of a subject polypeptide; or inhibits a DNA bmding 
activity of a subject polypeptide. 

[0390] This invention provides novel polypeptides, and related polypeptide 
compositions. The novel polypeptides of the invention enconq)ass proteins with 
amino acid sequences as shown in SEQ \D NOS.: 210 - 418, or encoded by the 
nucleic acids having nucleotide sequences shown in SEQ ID NOS.: 1 -209 and 419 - 
627. The subject polypeptides are human polypeptides, firagments thereof, variants 
(such as spUce variants), homologs fitom other species, and derivatives thereof In 
particular embodiments, a polypeptide of the invention has an amino acid sequence 
substantially identical to the sequence of any polypeptide encoded by a 
polynucleotide sequence shown m SEQ ID NOS.: 1 -209 and 419 - 627. 
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[039 1 ] Peptides can include naturally-occuiriiig and non-naturally occucring 
anuno adds. Peptides can cotq)iise D-*amino acids, a combination of D- and Lramino 
acids, and various "designer" amino adds (e.g., P-methyl amino acids, Ca-methyl 
amino acids, and Na-methyl amino acids, etc,) to convey special properties. 
Additionally, peptides can be cyclic. Peptides can include non-classical amino adds 
in order to introduce particular conformational motifs. Any known non-classical 
amino acid can be used. Non-classical amino acids include, but are not limited to, 
l,2,3,4-tetrahydroisoquinoline-3-caiboxylate; (2S,3S>methylphenylalanine, (2S,3R> 
methyl-phenylalanine, (2R,3S>-methyl-phenylalanine and (2R,3R>methyl- 
phenylalanine; 2-aminotetraliydronaplithalene-2-carboxylic add; hydroxy- 1,2,3,4- 
tetrahydroisoquinoline-3-caiboxylate; p-carboline (D and L); HIC (histidine 
isoquinoline carboxylic.acid); and HIC (histidine cyclic urea). Amino add analogs 
and peptidomimetics can be incorporated into a peptide to induce or favor specific 
secondary structures, including, but not limited to, LL-Aq) (LL-3-amino-2- 
propenidone-6-carboxylic acid), a P-tum inducing dipeptide analog; p-sheet inducing 
analogs; P-tum inducing analogs; a-helix inducing analogs; y-tum inducing analogs; 
Gly-Ala turn analogs; amide bond isostere; or tretrazol, and the like, 

[0392] A peptide can be a depsipeptide, which can be linear or cyclic (Kuisle 
et al., 1999). Linear depsipeptides can comprise rings formed through S-S bridges, or 
through an hydroxy or a mercapto group of an hydroxy-, or mercapto-amino acid and 
the carboxyl group of another anoino- or bydroxy-acid but do not coinprise rings 
formed only through peptide or ester links derived from hydroxy caiboxylic acids. 
Cyclic depsipeptides contain at least one ring formed only through peptide or ester 
links, derived fix)m hydroxy caiboxylic adds. 

[0393] Peptides can be cyclic or bicyclic. For example, the C-terminal 
carboxyl group or a C-terminal est^ can be induced to cyclize by internal 
displacement of the -OH or the ester (-OR) of the carboxyl group or est^ respectively 
with the N-terminal amino group to form a cyclic peptide. * For example, after 
synthesis and cleavage to give the peptide acid, the fiiee add is converted to an 
activated ester by an appropriate carboxyl gro\q> activator such as 
dicyclohexylcarbodiimide (DCC) in solution, for exanq)le, in methylene chloride 
(CH2CI2), dimethyl fonnamide (DMF) mixtures. The cyclic peptide is then formed by 
internal displacement of the activated ester with the N-terminal amine. Intemal 
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cyclization as opposed to polymerization can be enhanced by use of very dilute 
solutions. Metiiods for making cyclic peptides are well known in the art 

[0394] A desamino or descarboxy residue can be incorporated at the terminal 
ends of the peptide, so that there is no terminal amino or caiboxyl group, to decrease 
susceptibility to proteases or to restrict conformation. C-terminal functional groiq)s 
include amide, amide lower alkyl, amide di (lower alkyl), lower alkoxy, hydroxy, and 
caiboxy, and the lower ester derivatives thereof, and the phannaceutically acceptable 
salts thereof 

[0395] In addition to the foregoing N-terminal and C-terminal modifications, 
a peptide or peptidomimetic can be modified with or covalentiy coupled to one or 
more of a variety of hydrophihc polymers to increase solubility and circulation half- 
life of the peptide. Suitable nonproteinaceous hydiophilic polymers for coiq)ling to a 
peptide include, but are not limited to, polyalkylethers as exemplified by polyethylene 
glycol and polypropylene glycol, polylactic acid, polyglycolic acid, polyoxyalkenes, 
polyvinylalcohol, polyvinylpyrrolidone, cellulose and cellulose dierivatives, dextran, 
and dextran derivatives. Generally, such hydrophilic polymers have an average 
molecular weight ranging fi^om about SOD to about 100,000 daltons, fix»m about 2,000 
to about 40,000 daltons, or fiiom about 5,000 to about 20,000 daltons. . The peptide 
can be derivatized with or coupled to such polymers using any of the metiiods set 
forth in ZaUipsky, 1995; Monferdini et al., 1995; U.S. Pat Nos. 4,640,835; 4,496,689; 
4,301,144; 4,670,417; 4,791,192; 4,179,337, or WO 95/34326.. 

[0396] These polypeptides may reside within the cell, or extracellularly. 
They may be secreted fix}m the cell, reside in the cytoplasm, in the membranes, or in 
any of the intracellular organelles, including the nucleus, mitochondria, ribosomes, or 
storage granules. 

[0397] In many embodiments, a novel polypeptide of the invention fimctions 
as a secreted protein, a single-transmembrane protein, a multiple-transmembrane 
protein, a kinase, a protein kinase, a ligase, a nuclear hormone receptor, a 
phosphatase, a protease, a phosphodiest^xise, a kinesin, an immunoglobulin, a T-cell 
receptor, or a glycosylphosphatidylinositol anchor. A novel polypeptide of the 
invention can also possess one or more of the following functions or properties: (1) 
an activator functioning to regulate one or more genes by increasing the rate of 
transcription, (2) an activator functioning to positively modulate sui allosteric enzyme, 
(3) an adaptor functioning to sort cargo molecules into transport vesicles, (4) an . 
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adaptor functiomng to fonn a clathiin-coated vesicle, (5) an adhesion molecule 
functioning to mediate the adhesion of cells with other cells and/or the extracellular 
matrix, (6) an ATPase functioning to move ions or small molecules across a 
membrane against a chemical concentration gradient or electrical potential, (7) an 
ATPase functioning to translocate nucleotides across membranes, (8) a breakpoint- 
related sequence functioning as an oncoprotein, (9) a breakpoint-related sequence 
functioning as a tumor-specific antigen, (10) a channel functioning as a water chaimel, 
(1 1) a channel functioning as an ion channel, (12) a checkpoint-related sequence 
functiomng at DNA damage checlq)oints, (1 3) a checkpoint-related sequence 
functioning at replication checkpoints, (14) a checlq)oint-related sequence functioning 
to initiate signal transduction cascades eliciting cell cycle arrest, DNA lepair, or 
apoptosis, (15) a complex functioning as a protein scaffold, (16) a complex 
functioning in ADP-ribosylation, (1 7) a dehydrogenase functioning to synthesize 
amino acids, (18) a disintegrin functioning to inhibit blood clotting, (19) a disintegiin 
functioning as a metallopeptidase, (20) a GTPase functioning as a negative regulator 
of p53, (21) a GTPase functioning to stimulate ras GTPase activity, (22) ahelicase 
functioning in DNA replication, (23) a hydrolase functioning in proprionate 
metaboUsm, (24) an integrase functioning to integrate a DNA copy of a retroviral 
genome into a host chromosome, (25) an integrin functioning as a tumor marker, (26) 
an integrin functioning in cell migration, (27) an isomerase functioning as an 
immunosuppressant, (28) a menlbrane protein functioning as a scaffolding component 
at the cytoplasmic face of a lipid raft, (29) a membrane protein functioning as a ligand 
for a receptor tyrosine kinase, (30) oi^genases and peroxidases functioning as 
antioxidants, (31) a phospholipase functioning in eicosanoid synthesis, (32) a 
phospholipase functioning in preserving the intestinal mucosa, (33) a prosaposin 
functioning in lipid catabolism, (34) a proteasome component functioning in muscle 
wasting, (35) a reductase-related sequence functioning as a coenzyme A reductase 
inhibitor, (36) a reverse transcriptase functioning as an RNA-dependent reverse 
transcriptase, (37) a reverse transcriptase functioning as a DNA-dependent reverse 
transcriptase, (38) an RNase functioning in viral assembly, (39) an RNase H 
functioning to form oligonucleotides that prime DNA synthesis, (40) an SNase H 
functioning to cleave the RNA strand of an RNA-DNA hybrid, (41) SH3 domains 
functioning in actin cytoskeletal organization, (42) SIB domains fimctioning in signal 
transduction, (43) a synthetase functioning as an autoantigen (44) synthetases 
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functioning in nucleotide sugar phosphate synthesis, (45) TATA boxes functioning as 
a transcription initiators, (46) tat functioning as a transcriptional coactivator, (47) 
transferases functioning in signal transduction, (48) transposases functioning as gene 
transfer agents, (49) ubiquitins functioning to protect cells against tumor necrosis 
fector induced cell death, (50) proteasome conqjonents and ubiqiiitin functioning in 
protein degradation^ (5 1) a virus-related sequence functioning to confer resistance to 
infection by viruses, (52) other sequences of the invention interacting with one or 
more proteins, (53) other sequences of the mvention enzymaticaUy modifying one or 
more proteins, (54) other sequences of the invention binding one or more small 
molecule ligands, (55) other sequmces of the invention binding one or more pq)tides, 
(56) other sequences of the invention binding one or more carbohydrates, and (57) 
olher sequences of the invention functioning in vesicular transport 

[0398] In some embodiments, the present novel polypeptide modulates the 
cells or tissues of animals, particulariy humans, such as, for example, by stimulating, 
enhancing or inhibiting T or B cell function or the function of other hematopoeitic 
cells or bone marrow cells; modulates adult or embiyonic stem cell or preciuisor cell 
growth or differentiation; modulates cell function or activity of neuronal cells or other 
cells of the CNS, heart cells, liver cells, kidney cells, lung cells, pancreatic cells, 
gastrointestinal cells, spleen cells, breast cells, prostate cells, ovarian cells, and the 
like. 

[0399] In some embodiments, a subject polypeptide is present as a multimer. 
Multimea^ include homodimers, homotrimers, homotetramers, and multimers that 
include more than four monomeric units. Multimers also include heteiomultimers, 
e.g., heterodimers, heterotrimers, heterotetramers, etc. where the subject polypeptide 
is present in a complex with proteins other than the subject polypeptide. Where the 
multimer is a heteromultimer, the subject polypeptide can be preset in a 1 : 1 ratio, a 
1 :2 ratio, a 2: 1 ratio, or other ratio, with flie other protein(s)- 

[0400] In addition to the above specifically listed proteins, polypeptides 
from other species are also provided, including mammals, such as: primates, rodents, 
e.g., mice, rats, hamsters, guinea pigs; domestic animals, e.g., sheep, pig, horse, cow, 
goat, rabbit, dog, ca^ and humans, as well as non-mammalian species, e.g., avian, 
reptile and amphibian, insect, crustacean, fish, plant, fungus, and protozoa. 

[0401] By *'hom6iog" is meant a protein having at least about 35 %, at least 
about 40%, at least.aboiit 60%, at least about 70%, at least about 75%, at least about * 
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80%, at least about 85%, at least about 90%, or at least about 95%, or higher, ammo 
acid sequence identity to the reference polypq)tide, as measured with the "GAP" 
program (part of the Wisconsin Sequence Analysis Package available throu^ the 
Genetics Computer Group, Lie. (Madison WI)), where the parameters are: Gap 
weight: 12; length wei^t4. In many embodiments of interest, homology will be at 
least about 75%, at least about 80%, or at least 85%, where in certain embodiments of 
interest, homology will be as high as about 90%. 

[0402] Also provided are polypeptides that are substantially identical to the 
at least one amino acid sequence shown in the Sequence Listing, or a fragment 
thereof, whereby substantially identical is meant that the protein has an amino acid 
sequence identity to the reference sequence of at least about 75%, at least about 80%, 
at least about 85%, at least about 90%, at least about 95%, at least about 97%, at least 
about 98%, of at least about 99%. 

[0403] The proteins of flie subject invention (e.g., polypeptides encoded by 
the nucleotide sequences shown in SEQ ID NOS.: 1 - 209 and 419 - 627, and 
polypeptide sequences shown in SEQ ID NOS.: 210 - 418) have been separated from 
their naturally occurring environment and are present in a non-naturally occurring 
environment In certain embodiments, the proteins are present in a composition 
• whexe fliey are more concentrated than in their naturally occurring environment For 
example, purified polypeptides are provided. 

[0404] In addition to iiaturally occurring proteins, polypeptides fliat vary 
from naturally occurring forms are also provided. Fusion proteins can comprise a 
subject polypeptide, or fragment thereof, and a polypeptide other than a subject 
polypeptide ("the fusion partner") ftised in-frame at the N-teraiinus and/or C-terminus 
of the subject polypeptide, or internally to the subject polypeptide. 

[0405] Suitable ftision partners include, but are not limited to, 
immunologically detectable proteins (e,g., epitope tags, such as hemagglutinin, 
FLAG, and c-myc); polypeptides that provide a detectable signal or that serve as 
detectable mazk^ (e.g., a fluorescent protein, e.g., a green fluorescent protein, a 
fluorescent protein fix)m an Anthozoan species; P-galactosidase; luciferase; ere 
recombinase; and the like); polypleptides that provide a catalytic frinction or induce a 
cellular response; polypeptides that provide for secretion of the ftision protein from a 
eukaryotic cell; polypeptides that provide for secretion of the ftision protein fix)m a 
prokaryotic cell; polypeptides that provide for binding to metal ions (e.g., Hisn, where 
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n - 3-10, e.g., 6His) and structural proteins. Fusion partners can also be those that are 
able to stabilize the present polypeptide, such as polyethylene glycol ("PEG") and a 
fragment of an immunoglobulin, such as the Fc fragment of IgG, IgE, IgA, IgM, 
and/or IgD. 

[0406] Detection methods are chosen based on the detectable fusion partner. 
For example, where the fusion partner provides an immunologically recognizable 
epitope, an epitope-specific antibody can be used to quantitatively detect the level of 
polypeptide. In some embodiments, the fusion partner provides a detectable signal, 
and in these embodiments, the detection method is chosen based on the type of signal 
generated by the fusion partner. For example, where the fusion partner is a 
fluorescent protein, fluorescence is measured 

[0407] Where the fusion partner is an enzyme that yields a detectable 
product, the product can be detected using an appropriate means. For example, P- 
galactosidase can, depending on the substrate, yield a colored product that can be 
detected witii a spectrophotometer, and the fluorescent protein luciferase can yield a 
luminescent product detectable with a luminometer. 

[0408] In some embodiments, a polypeptide of the invention comprises at 
least about 5, at least about 8, at least about 10, at least about 15, at least about 18, at 
least about 20, at least about 25, at least about 30, at least about 50, at least about 75, 
at least about 100, at least about 150, at least about 200, at leiast about 250, at least 
about 300, at least about 350, at least about 400, at least about 450, at least about 500, 
at least about 550, at least about 600, at least about 650, at least about 700, at least 
about 750, at least about 800, at least about 850, at least about 900, at least about 950, 
or at least about 1000 contiguous amino acid residues of at least one of the sequences 
according to SEQ ID NOS.: 210 - 418, up to and including the entire amino acid 
sequence. 

[0409] Fragments of the subject polypeptides, as well as polypeptides 
comprising such fragments, are also provided. Fragmmts of polypeptides of interest 
will typically be at least about 5, at least about 8, at least about 10, at least about 15, at 
least about 1 8, at least about 20, at least about 25, at least about 30, at least about 50, 
at least about 75, at least about 100, at least about 1 50, at least about 200, at least 
about 250, or at least 300 aa in length or longer, v/bcre the fragment will have a 
stretch of amino acids that is identical to the subject protein of at least about 5, at least 
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about 8, at least about 10, at least about 15, at least about 18, at least about 20, at least 
about 25, at least about 30, or at least about 50 aa in length. 

[0410] In some embodiments, fragments exhibit one or more activities 
associated with a corresponding naturally occurring polypeptide. Fragments find 
utihty in generating antibodies to the fuU-length polypeptide; and in mediods of 
screening for candidate agents that bmd to and/or modulate polypeptide activity. 
Specific Augments of interest include those with enzymatic activity, those with 
biological activity including the ability to serve as an epitope or immunogen, and 
fragments that bind to other proteins or to nucleic acids. 

[041 1 ] The invention provides polypeptides comprising such firagments, 
including, e,g., fusion polypeptides comprising a subject polypeptide fi:agment fiised 
in frame (directly or indirectly) to another protein (the "fiision partner"), such as the 
signal peptide of one protein being fused to the nmture polypeptide of another protein. 
Such fusion proteins are typically made by linking the encoding polynucleotides 
together in a vector or cassette. Suitable fusion partners include, but are not limited 
to, inmiunologically detectable proteins (e.g., epitope tags, such as hemagglutinin, 
FLAG, and c-tnyc); polypeptides that provide a detectable signal or that serve as 
detectable markers (e.g., a fluorescent protein, e.g., a green fluorescent protein, a 
fluorescent protein fix>m an Antho2X>an species; p-galactosidase; lucifetase; ere 
recombinase); polypeptides that provide a catalytic function or induce a cellular 
response; polypeptides that provide for secretion of the fusion protein fix)m a 
eukaryotic cell; polypeptides that provide for secretion of the fusion protein fix)m a 
prokaryotic cell; polypeptides that provide for binding to metal ions (e.g., Hisn, where 
n = 3-10, e.g., 6His) and structural proteins. Fusion partners can also be those that are 
able to stabilize the present polypeptide, such as polyethylene glycol ("PEG") and a 
firagment of an immunoglobulin, such as the Fc firagment of IgG, IgE, IgA, IgM, 
and/or IgD. 

Polypeptide PreparatiOTL 

[041 2] Polypeptides of the invention can be obtained from naturally- 
occurring sources or produced synthetically. The sources of naturally occurring 
polypeptides will generally depend on the species firom which the protein is to be 
derived, i.e., the proteins will be derived fix)m biological sources that express the 
proteins. The subject proteins can also be derived from synthetic means, e.g., by 
e3q)ressing a recombinant gene encoding a protein of interest in a suitable system or 
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host or enhancing endogenous expression, as described in more detail above. Finrfher, 
small peptides can be synthesized in the laboratory by techniques well known in the 
art. 

[041 3] In all cases, the product can be recovered by any appropriate means 
known in the art. For example, convenient protein purification procedures can be 
CTiployed (e.g., see Guide to Protein Purification, Deuthscher et al., 1990). That is, a 
lysate can be prepared from the original source, (e.g., a cell expressing endogenous 
polypeptide, or a cell comprising the expression vector expressing the polypeptide(s)), 
and purified using HPLC, exclusion chromatography, gel electrophoresis, or affinity 
chromatography, and the like. 

[0414] The invention ttms also provides methods of producing polypeptides. 
Briefly, the methods generally involve introduciag a nucleic acid construct into a host 
cell in vitro and culturing the host cell under conditions suitable for expression, then 
harvesting the polypeptide, either from the culture medium or fix>m the host cell, (e.g., 
by disn?)tmg the host cell), or both, as described in detail above. The invention also 
provides methods of producing a polypejptide using cell-fiw in vitro 
transcription/translation methods, which are well known in the art, also as provided 
above 

[041 5] Moreover, the invention provides polypeptides, includmg polypeptide 
fragments, as targets for therapeutic intervention, including use in screening assays, 
. for identifying agents tiiat modulate polypeptide level and/or activity, and as targets 
for antibody and small molecule therapeutics, for example, in the treatment of 
disorders. 

Kits 

[0416] The preisent mvention provides kits for diagnosing disease states 
based on the detected presence and/or level of polynucleotide or polypeptide in a 
biological sample, and/or the detected presence and/or level of biological activity of 
the polynucleotide or polypeptide. The invention fiirther provides kits for detecting 
the presence and/or a level of a polynucleotide or polypeptide in a biological sample 
and/or or the detected presence and/or level of biological activity of the 
polynucleotide or polypeptide. Procedures using these kits can be performed by 
clinical laboratories, experimental laboratories, medical practitioners, or private 
individuals. 



119 



wo 2004/020595 PCTAJS2003/027107 

[0417] The kits of the mvention will comprise a molecule of the invention. 
The kits for detecting a polynucleotide will also comprise a moiety that specifically 
hybridizes to a polynucleotide of the invention. The polynucleotide molecule can be 
of any length. For example, it can con^rise a polynucleotide of at least 6, at least 7, 
at least 8, or at least 9 contiguous nucleotides of a molecule of the invention. Kits of 
the invention for detecting a subject polypq)tide will comprise a moiety that 
spedficaUy binds to a polypeptide of the invention; the moiety includes, but is not 
Umited to, a polypeptide-specific antibody. 

[0418] Hie kits are usefiil in diagnostic plications. For example, the kit is 
useful to determine whether a given DNA sanq»le isolated fiom an individual 
conq)rises an e3q>ressed nucleic acid, a polymoiphism, or other variant 

[0419] Kits for detecting polynucleotides comprise a pair of nucleic acids m 
• a suitable storage medium, e.g., a buffered solution, in a suitable coiitainer. The pair 
of isolated nucleic acid molecules serve as primers in an amplification reaction (e.g., a 
polymerase chain reaction). The kit can further include additional buffers, reagents 
for polymerase diam reaction (e.g., deoxynucleotide triphosphates (dNTP), a 
thermostable DNA polymerase, a solution containing Mg^ions (e.g., Mga2), and 
other components well known to those skiUed in the art for carrying out a polymerase 
chain reaction). The kit can further include instructions for use, which may be 
provided in a variety of forms. e.g., printed information, or compact disc, and the like. 
The kit may further include reagents necessary to extract DNA fiom a biological 
sample and reagents for generating a cDNA copy of an mRNA. The kit may 
optionally provide additional usefiil coinponents, including, but not limited to, 
buffers, developing reagents, labels, reacting surfeces, means for detections, control 
san^les, standards, and interpretive information. 

[0420] In some embodhnents, a kit ofthe invention for detecting a 
polynucleotide, such as an mRNA encoding a polypq)tide, comprises a pair of nucleic 
acids that function as "forward" and "reverse" primras that specifically amplify a 
. cDNA copy ofthe mRNA. The "forward" ^d "reverse" primers are provided as a 
pair of isolated nucleic acid molecules, each &om about 10 to.about 200 nucleotides 
in length, the first nucleic acid molecule ofthe pair comprismg a sequence of at least 
. about 10 contiguous nucleotides having 100% sequence identity to a nucleic add 
sequence shown in from SEQ ID NOS.: 1 - 209 and 419 - 627, and the second nucleic 
acid molecule of tiie pair comprising a sequence of at least about 10 contiguous 
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nucleotides having 100% sequence identity to the reverse complement of a nucleic 
acid sequence shown in SEQ ID NOS,: 1 - 209 and 419 - 627 , wherein the sequence 
of tiie second nucleic acid molecule is located 3' of the nucleic acid sequence of tiie 
first nucleic acid molecule. The primer nucleic acids are prepared using any known 
method, e.g., automated synfliesis. In some embodiments, one or both members of 
the pair of nucleic acid molecules comprise a detectable label. The kit may include 
blockmg reagents, buffers, and reagents for developing and/or detecting the detectable 
label. The kit may also include instructions for use, controls, and interpretive 
information. 

[0421] Where the Idt provides for detectiiigmzymatic activity, it in^^ 
substrate that provides for a detectable product v^en acted upon by a polypeptide of 
interest The kit rnay further include reagents necessary to detect and develop 
detectable marker. 

[0422] The present invention provides for kits with unit doses of an active 
agent These agents are described in inore detail below. In some embodiments, the 
agent is provided in oral or injectable doses. Such kits will comprise containers 
containing the unit doses and an informational package insert describing the use and 
flftCTflfltit benefits of the drugs in treating a condition of interest 
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Table 3. Characteristics of the Fantom Mouse Proteia Wifli the Highest Degree 
of Similarity to the Claimed Sequences 



FPm 


A'AUWIIl XU|I Alll rVILIlOLilUOIl 


HG1000214N0_160000^ene_predictio 
nl 


pro-o lynipiiocyie gene i [JVlus musculus] . 


HG1000323NO 160000_gene_predictio 
nl 


iipoproiem lipase [Mus musculusj 


HG1000323N0_160000.^e_ptedictio 
n2 


similar to procollagen, type V, alpha 2 [Mus 

niuscuiusj 


HG1000327N0_1000_£enejpredictionl 


unnamed protein product [Mus musculus] 


liCji 0(K)327N0_1 60000_gene jredictio 
nl 


unnamed protein product [Mus musculus] 


HG100(H34N0_160()00jgene_predictio 
nl 


uromodulin; Tamm-Horsfall glycoprotein [Mus 
musculus] 


HG1000449N0 160000_genej»redictio 
nl 


trefoil &ctor 1 [Mus musculus] 


HGIOOOSOTNO 160000Lgene_piedictio 
nl 


IGFBP-Iike protein [Mus musculus] 


HGl 000807N0_5000_£ene_predictionl 


gi|9055246|ref|NP_06121 Ll| IGFBP-like 
protein [Mus musculus] 


HG1001280N0_160000_^e_predictio 

nl 


gi|26336763|dbj|BAC32064.1| mmamed protein 
product [Mus musculus] 


HG1000193N0_160000_gene_predictio 
nl 


gil21595011|gb|AAH31409.1| RIKENcDNA 
241 0030007 gene [Mus musculus] 


HGl 000286N0_160000_^e_piedictio 
nl 


gi|303678|dbj|BAA02298.1| 47-kDa heat shock 

Drotein [Mus musculus] 


HG1000569N0 160000_gene_predictio 
nl 


gi|208819831reflXP_122793.1| similar to heat- 
stable antigen-related hypothetical protein 
ISA-C - mouse [Mus musculus] 


HG1000992N0 160000 _£ene_predictio 
nl 


gi|26331916|dbj|BAC29688.1| unnamed protein 
product [Mus musculus] 


wrjinni i^sxrn i#^aaaa ^^^^ 

xiu 1 uu I i JN u 1 ouuuU_gene jredictio 

nl 


gi|6752962|ref|NP_033744,l| a disinlegrin and 
metalloprotease domain IS (metargidin); a 
disintegrin and metalloprbteinase domain 
(ADAM) 15 (metargidin) [Mus musculus] 


HG100118SN0 160000 eene oredictio 
n2 . 


/ o jjciDj|x>A^zooj 1 . 1 1 unnameci protem 
product [Mus musculus] 


HG1001280N0_5000_gene_predictionl 


gi|26336763|dbj|BAC32064.1| unnamed protein 
product [Mus musculus] 


HG1001302N0_160000_gene_predictio 
n2 , 


5i|20136122|gb|AAMl 1539.11 matoilin-2 [Mus 
nusculus] 


HG1000361N0_160000.^e_predictio { 
nl ( 


>i|20867549|reflXP_125932.1| RKEN cDNA 
)030421L1 1 [Mus musculus] 
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HG1000361N0 20000_gene__prediction 
1 


gi|26330472|dbj|BAC28966.1| imnamed protein 
pnxluct [Mus musculus] 


HG1000792N0 160000_^aie_pn5dictio 
nl 


gi|272291 18|refINP_082129.2| RKEN cDNA 
0610006F02 (Mus muscidus] 


HG1000934N0 160000_gene_predictio 
nl 


gi!20867549|refp{P_125932.1| RIKEN cDNA 
9030421 L1 1 rMn<: nni«ni1ii«1 


HG1000976N0 160000_jene_piedictio 
nl 


gi|11967965|ref|NP_071879.1| cytochrome 

P4-S0 ^llHflimil V TVT? rknl\mf>nfi/lA 1 A 
•I- owyidlillijr 1 VX^j pUlVpcpiiQC l*f 

(leukotriene B4 omega hydroxylase) [Mus 
musculus] 


HG1000992N0 10000^ene_prediction 
1 


gi|26331916|dbj|BAC29688.1| unnamed protein 
product [Mus musculus] 


HGIWI 185N0_1000_gene_predictionl 


gi|26329785|dbj|BAC28631.1| unnamed protein 
product [Mus musculus] 


HG1001185N0 160000_ge(ne_predictio 
nl 


gi|26329785|dbj|BAC2863 Ll| unnamed protein 
product [Mus musculus] 


HGlOOl 1 85N0_1 000_gene_prediction2 


gi|26329785|dbj|BAC28631.1| unnamed protein 
product [Mus musculus] 


HGlOOl 1 85N0_5000_gene_predictionl 


gi|26329785|dbjIBAC28631.1| unnamed protein 
product [Mus musculus] 


HG1001280N0 10000^ene_prediction 
1 


gi|26336763|dbj|BAC32064.1| unnamed protein 
product [Mus musculus] 


HG1000361N0 10000_gene_prediction 
1 


gi|26330472|dbj|BA<:28966.1| unnamed protein 
product [Mus musculus] 


HG1001381Nd_1000^ene_predictionl 


gi|26343077|dbj|BAC35I95.1| unnamed protein 
product [Mus musculus] 


HG1000263NO__5000jgene_predicti<Mil 


gi|26360198|dbj|BAB25612.2| unnamed protein 
product [Mus musculus] 


HG1001052N0_0_gene_predictionl 


gi|20072693|gb|AAH272?7,l| Sinnlarto cyclin 
K [Mus musculus] 


HG1000498NO 1600()0^ene_predictio 
nl 


gi|26352844|dbj|BAC>M)052,l| unnamed protein 
Droduct [Mus musculus] 


HG1000579N0 160000_genejpredidio 


product [Mus musculus] 


HG1000685N0 160000_genejpiedictio 
nl 


voltage dependent, alpha2/delta subunit 3; 
alpha 2 delta-3 [Mus musculus] 


HG1000191N0 160000__gaie_predictio 
nl 


gi|13385832|ref|NP_080608.1| RIKEN cDNA 
1810055D05 [Mus musculus] 


HG1000296N0_160000_jene_predictio 
n2 . 


gi|25054735|ref|XP_192839.1| ATPas, class II, 
type 9B [Mus musculus] 


HG1000346N0_1000jgene_predictionl 


gi|26330504|dbj|BAC28982.1| unnamed protein 
3rdduct JMus musculus] 


HG1000963N0_5000_^ene_predictionl 


ei|1296366S|iefiNP 075892.11 mesodenn 
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FPID 
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development candiate 2; RIKEN cDNA 
221001501 1 gene [Mus musculus] 


Hni 00061 ONO 160000 Bene nredietin 


product [Mus musculusl 


nl 


gi|20881983|reflXP_122793.1| sumlar to heat- 

oUU/iC autlgCU'lcUtUXl iiypuuiviicai piuiciii 

HS A-C - mouse [Mus musculus] 


HG1000342N0 160000_gene_predictio 
n2 


stable antigen-related hypothetical protein 
HSA-C - mouse [Mus musculus] 


HG1000650N0 20000^ene_prediction 
1 


gi|20270210|ref|NP__083847.1| RKEN cDNA 
1 1 10001 A12 [Mus musculus] 


HG100O191N0 160000_gene_predictio 
n2 


gi|13385832|ref|NP_080608.1| RIKEN cDNA 
1810055D05 [Mus musculus] 


HG1000449N0 160000_gene_piedictio 
nS 


gi|6755773|ief|NP_035705.1| trefoil factor 3, 
intestinal [Mus musculus] 


xIOIUUUIoXInU zUUUU^jgenej)re<uction 

1 


gi|zo334 / jd|cloj|i>AC3 1 U /o. 1 1 unnamed protem 
product [Mus musculus] 


HG1001058N0_1 60000 _gene_predic4io 
nl 


gi|z03442o2]rei|AJk'_iluyDy.l| suniiarto 
LD3 1 582p [Drosophila melanogaster] [Mus 
musculus] 


HG1000187NO 160000_^ene_predictio 
n2 


gi|26346705|dbj|BAC37001.1| unnamed protein 
product {Mus musculus] 


HG1000191N0^1000_gene_predictionl 


gi|13385832|ref|NP_080608.1| RKEN cDNA 
18100SSD05 [Mus musculus] 


iiu 1 uuui 1 yw u 1 ouuuu gene predicno 
nl 


gi|z jUz l4Do|rei|Ar_2U /you. 1 1 suniiar to . 
pORF2 [Mus musculus domesticus] 


HG1000137N0_0jgene_predictionl 


gi|/Uo43 /oy|rei|AJr__i33oi4.i| smniar to 
hypothetical protein IMAGE3455200 [Homo 
sapiens] [Mus musculus] 


HG1000191N0_5000_gene_predictionl 


gi|12842346|dbj|BAB2S565.1| unnamed protein 
product [Mus musculus] 


HG1000622NO 160000_gene_predictio 
nl 


gil25022040|ref|XP_204233.1| similar to ORF2 
[Mus musculus domesticus] 


HG1000390NO lOOOeenepiedictionl 


gi|20892585|ref|XP_147977.1| RIKEN cDNA 
2610001E17 [Mus musculus] 


HG1001350NO_5000 _gene_preciictionl 


gi|13386102|reflNP_080892.1| RKEN cDNA 
1500026D16 [Mus musculus] 


nvj i vyjuDjc / vh vi_i uvA^uu__gcnc_prvUicuio 
ii2 


product [Mus musculus] 


HG1000179N0J60000^eaej)redictio 
nl 


gil20862121|refpaP_146270.1| similar to 
putative alpha 1,3-fucosyl transferase [Mus 
musculus] 


HG1000806N0 20000 gene prcdiction 


Ei|23592855|ref|XP 129487.21 hvpothetical 
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Pantom Top Hit Annotation . 


i 
1 


DTOtein MGC40674 FMus musculusl 


HG1000991N0 160000_£ene_predictio 
nl 


gi|6755338|ref|NP_036pi3.11 ring finger 
[itotein 13 [Mas muscoius] 


HG10O1489N0 20000_£ene_prediction 
1 


5i|23592855|reipQr_i29487,2| nypometicai 
protein MGC40674 [Mus musculus] 


xlvjr 1 uu 1 U J orNU juuu^gcuc yremuu-om 


»i|208920511reipQ'_148657.1| smulai to 
Le^(2)nei^boiir of tid protdn 2 (NOT53) 


HG1001376N0_160000^eaie_piedictio 
n2 


gil27261816|refpSlP_080861,l| RIKEN cDNA 
C530005J20 [Mus musculus] 


HG1001376N0_20000_genejprediction 
2 


gil27261816|ref|NP_080861,l| RIKEN cDNA 
C53000S J20 [Mus musculus] 


HG1001478N0_10000jgene_prediction 

1 


gi|6979907|gb| AAF34647. 1 1 AF22 1 1 03_1 
kinesin-related protein KIFC5B [Mus 

mil coil 111 o 1 

niuscuiusj 


HG1000806N0_160000^ne_predictio 
nl 


gi|23592855|refIXP_129487.2| hypothetical 

■r>«v\f*»ir» \Af^C^ A{\^n A. nVyfiic TniiconliiQl 
prOieUl jYlvlV-/'fUU /f |_iYlu5 IUU4M.^UlUoJ 


HG1000409NO_160000^ene jpredictio 
nJ. 


gi|3599320|gh|AAC72793.1 1 ORF2 [Mus 
muscuius ooniesucusj 


HGiO00884N0_160OO0 _gene_predictio 
nl 


gi|26329055|dbj|BAC28266.1| unnamed protein 
proouci [lYius muscuiusj 


HG1000575N0_160000 _^ejpredictio 
nl 


gi|20889984|refpaP_129281.1| RIKEN cDNA 
^voUD^ojJi / LMUsnniscuiusj 


HG1000403N0_160000_gene_predictio 
nl 


gi|26340168|dbj|BAC33747,l| unnamed protein 

proGUCX [ivius muscuiusj 


HG1000906N0 10000_^CTe_prediction 
1 


gi|20836822|reflXP_130277.1| similar to 
Plakophilin 4 (p0071) [Mus musculus] 


HG1001201NO 160000_gene_predictio 
nl 


gil26341746|dbj|BAC34535.1| unnamed protein 
product [Mus musculus] 


HG1000485N0_160000_gene_predictio 
nl 


gi|23597904|reflXP_129263.2| protein 
phosphatase 1, regulatory (inhibitor) subunit 3C 

nVAiic tiniicoiiliicl 
I^XVlilo lUUSvUiUaJ 


HG1000328N0_160000 _gene_predictio 
nx 


gi|263367311dbj|BAC32048.1| unnamed protein 

■nrnHiipf FA^iiq mnQfiiliiQl 

piUUU^l l^xVJLUo lU.UOV'UlUOj 


HG1000231NO 160000 _gene_predictio 
nl 


gil26341312|dbj|BAC34318.1| unnamed protein 
product [Mus musculus] 


HG1001257N0 10000_gaie_prediction 
1 


gi|26346593|dbj|BAC36945.1| unnamed protein 
product [Mus musculus] 


HG1000026N0 5000_gene_predictionl 


gi|9506367|ref|NP_062425.1|ATP-binding 

binding cassette, sub-family B (MDR/TAP), 
member 12; Abc-mitochondrial erythroid [Mus 
musculus] 


HG1000300N0 160000 eene ptedictio 


gi|12846244|dbilBAB27089.11 unnamed protein 
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nl 


pioduct [Mus musculus] 


HG1000109N0 160000_£ene_predictio 
nl 


gi|22779909|refINP_690028.1| RIKEN cDNA 
2700083801 FMus mu^iInO 


HG1000617NO 20000^enejprediction 
1 


gi|79491 15|reflNP_058079.1| Ser/Aig-related 

xiuv/i^ax lixcxtiuw yj^yjmixXy ^xwllij*^l~uiUlJxi>7o"'l\/ly 

serme/aiginme repetitive matrix protein 1 [Mus 
musculus] 


HGlOOlllONO 160000_gene_predictio 
nl 


gi|22779909|ref|NP_690028.1| RIKEN cDNA 
2700083B01 [Mus musculus] 


HG1001334N0 160000_gene_predictio 
nl 


gi|26332062|dbj|BAC2976Ll| unnamed protein 
product [Mus musculus] 


HG1001376N0 160000_gene_predictio 
n3 


gi|27261816|ref|NP_080861.1| RIKEN cDNA 
C530005 J20 [Mus musculus] 


R01 nnnn^f^N'n 9nnon amp mwiir^tinn 
1 


gi|9506367|refINP_062425.1| ATP-binding 
cassette, sub-femily B, member 10; ATP- 
binding cassette, sub-family B (MDR/TAP), 
memuvr iz, nuc-miiocnonunai ^ryuiroici i^JMlus 
musculus] 


HG1000276M0 10(10 aene nnvlirHnnl 


gi|19527228|reflhfP_598768.1| DNA segment, 
Chr 1 0, ERATO Doi 214, expressed [Mus 


HG1000822N0 160000__gene_predictio 
n2 


gi|6680195|ref|NP^032255.1| hisUme 

ueaceiyiase uxn/v segment, cnr lu, Wayne 
State University 179, expressed [Mus 
musculus] 


HG1000173N0 20000_fiene_prediction 
1 


gi|26345110|dbj|BAG36204.1| umiamed protein 
product [Mus musculus] 


HG1000834NO 160000_gene_predictio 
nl 


gi|3599320|gb|AAC72793,l| 0RF2 [Mus 
musculus domesticus] 


HG1001044NO_1000 jgene_predictionl 


gi|26330836|dbj|BAC29148,l| unnamed protein 
product [Mus musculus] 


HG1000299N0_1000_gene_predictionl 


gi|67538821ref|NP_034349.1| FK506 .binding 
protein 4 (59 kDa) [Mus musculus] 


HG1000752N0 10000 _gene_piediction 
1 


ml259556QRIphlAATTdn^5{7 11 Similar fn 
PTPLl-associated RhoGAP 1 [Mus musculus] 


HG1000839N0 160000_gene_predictio 
n2 


RIKEN cDNA 2310010G13 gene [Mus 
musculus] 


HG1000659N0 160000_^enejpredictio 
nl 


gi|26333733|dbj|BAC30584.1I unnamed protein 
product [Mus musculus] 


HG100O659N0 160000_gene_predictio 
n2 


gi|26333733|dbj|BAC30584.1| unnamed protein 
product [Mus musculus] 


HG1000013N0 160000_£«ie_predictio 
nl 


gi|2088l 1361refpa»_12€284.1| similar to spenn 
antig^ HCMCKiTrl [Homo saniensi [Mus 
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musculus] 


rivj 1 uuu 1 / ^ JN u 1 omiuu gene prcQicuo 
nl 


i^ZAJOHO 1 1 U|CIUJ |D/\V^30ZU'r, 1 1 illlilajllCU- piUlCili 

3roduct [Mus musculus] 


rivjiOOOjiONU 1 ouUUU^eiie preoictio 
nl 


gi|27462832|gb|AAO15605.1|AF462146J 
moQUiaior oi esirogen uinucca uanscnpuuii 
Mus musculus] 


HGl 000360NO_20000jg€ne jiediction 


^|/oOi /HO|gO|AAr /UJ<$4.1|Arioi/ZOo__l 

GAB A-A receptor epsilon-like subunit (Mus 
musculusl 


HGl 000178N0_10000_genejprediction 


gi|133848301refINP_079706.1| RIKEN cDNA 
11 10066C01 [Mus musculus] 


HG1000178NO 10000_^aie_preaiction 
2 


gl|13io4o30|rei]Nr_07y7Uo.l| KUSJIjN CjJJNA 
1110066C01 [Mus musculus] 


HG1000360NO 20000jgene_precliction 
2 


gl|7oDl74o|gb|AAr/U3o4.1|Ar loyzo3_l 
GAB A-A receptor epsilon-like subunit [Mus 
musculus] 


HG1000640NO 160000_genej)redictio 
nl 


gi|21313034|reflNP_08034o,l| RIKEN cDNA 
2900091E1 1 [Mus musculus] 


HG1001000NO_160000 ^genejredictio 

nl 


gi|10181212|rei|NP 06Sol3.1| RIKH< cDNA 
1300007B 12; clone MNCb-2755 [Mus 

TmiQPiilnQl 


HG1001418N0_160000_j;ene_predictio 

Til 

ni 


gii20819462|ief|XPJ58058.1| hypothetical 

nmtein XP 1 58058 FMus musculusl 


HG1000153NO_20000jgenejprediction 

1 
1 


gil26379523|dbj|BAB29070.2| unnamed protein 

"ATAHiiot rM^ii<j Tmi<!Pnlii<il 


HG1000255N0_160000_gene_predictio 
. ni 


gi|13385532|refiNP_080303.1| RIKEN cDNA 
2700086123 FMus musculusl 


HGl 0001 86N0_1 60000_jene jredictio 

Hi 


gi|20963196|ref|XP_135684.11 RIKEN cDNA 
1700022L20 FMus musculusl 


HG1000259N0 160000_^ene_predictio 
nl 


gi|26360198|dbj|BAB25612.2| unnamed protein 
product [Mus musculus] 


HG1000559N0_10000jgene_piediction 
-1 




HG1000084N0_10000_^ene_piediction 

i 


gi|6678794|refjNP_032953.1| mitogen activated 
protem jonase jcmase m/\x jonase jonasc i, 
protein kinase, mitogen activated, kinase 1 , p45 
[Mus musculus] 


HG1000217N0_160000_gene_predictio 
nl 


gi|668 1 0 1 5|ref|NP_03 1789. 1 1 cysteine rich 
intestinal protein [Mus musculus] 


HG1000217NO_1 60000^ene jrcdictio 
n2 


gi|6681015|ref|NP_031789.1| cysteine rich 
intestinal protein [Mus musculus] 


HG1000329N0_160000_genejredictio 
nl 


gil26330870|dbj|BAC29!165.1| unnamed protein 
product [Mus musculus] 


■ HG1000570N0 160000 eene predictio 


gil6716522|eb|AAF26675.1|AF155821 .1 
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nl 


CPG16 [Mus musculus] 


HG1000617NO 40000 _jgenejprediction 
1 . 


gi|3599320|gb|AAC72793.1j ORF2 [Mus 
musculusdomesticus] 


HG1000227NO 160000_gene_predictio 
nl 


gi|21362402|sp|Q9CZB0lG560_MQUSE 
Succinate dehydrogenase cytochiozne b560 
subimit, mitochondrial precursor (Integral 
membrane protein Cn-3) (QPSl) (QPs-1) 


HG1000269NO 10000jgene_prediction 
1 


gi|7706341|ref|NP_057145.1| yippee protein 
[Homo sapiens] 


HG1000615NO 160000jgeiie_predictio 
n2 


gi|450o725|rei|NP_000998.1| nbosomal protem 
S4, X-Unked X isofonn; 40S ribosomal protein 
S4 X isoform* ribosomal nrotein S4X isofonn* 
single-copy abundant mRNA; cell cycle gene 2 
[Homo sapiens] 


HG1000617NO 160000_jene_predictio 
nl 


gi|3599320|gb|AAC72793.1| ORF2 [Mus 
musculus domesticus] 


HG1000621N0_160000_gene_predictio 


gi|4506725|rei|>lP_000998.1| nbosomal protem 
S4, X-linked X isofonn; 40S ribosomal protein 
S4, X isofonn; ribosomal protein S4X isofonn; 
single-copy abundant mKNA; cell cycle gene 2 


HG1000990NO 1 60000 jgene_predictio 
nl 


gi|10946760|ref|NP_067381.1I triggering 

triggering receptor expressed in monocytes 1 
[Mus musculus] 


HG1000998N0 160000^ne_predictio 
nl 


gil6678483tref|NP_033483.11 ubiquitm- 
activating enzyme El, Chr X [Mus musculus] 


HG1001225N0 160000jgene_predictio 
nl 


gill0181192|refINP_065589,l| sulfotransferase- 
related protein SULT-Xl [Mus musculus] 


HG1001269N0_5000^enejpiedictionl 


gi|21311883|TeflNP_080887.1| RIKEN cDNA 
0610007007 [Mus musculus] 


HG1001269N0 160000_gene_predictio 
nl 


gi|21311883|ref|NP_080887.1| RIKEN cDNA 
0610007007 [Mus musculus] 


HG1000103N0 160000 eene medictio 
nl 


gi|26327721|dbj|BAC27604.1| unnamed protein 
product [Mus musculus] 


HG1000143N0_1000_gene_predictionl 


gi|l4141193|ief|NP_001004.21 ribosomal 
protein S9; 40S ribosomal protein S9 [Homo 
sapiens] 


HG1000396NO 160000_gene_predictio 
nl 


gi|250247691reflXP_207136.1| similar to 0RF2 
[Mus musculus domesticus] 


HG1001502N0 160000jgene_predictio 
n2 


gi|21441001pir||I64837 Set beta isofonn - rat 


HG1000066NO 1 60000 jgene_predictio 
nl 


gi|26337951|dbj|BAC3266hl| unnamed protein 
product [Mus musculus] 
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HG100007RN0 1000 eene oredictionl 


ji|26346587ldbj|BAC36942.1| unnamed protein 
Dioduct [Mus musculus] 


HG1000117N0 160000_genejpredictio 
hi 


5L|208755801refpa?J31 162.11 sorting nexin 7 
^us musculus] 


HG1000157N0_160000^enej)redictio 

n1 


gi|26344914|dbj|BAC36106.1| unnamed protein 
product [Mus musculus] 


HG1000194N0_160000^ene_pi€dictio 

111 


gi|21313022|ieflNP-083674.1| RKEN cDNA 
5730496E24 [Mus musculus] 


HG1000501N0J60000^esnejpredictio 

Til 


gi|27370478|ref|NP_766552.1| hypothetical 
jrotein E130310N06 [Mus musculus] 


HG1000656N0^10000_genejprediction 

1 
1 


gi|12855078|dbj|BAB30210.1| unnamed protein 
rrroduct FMus musculus! 


HG1000656N0_10000jgenej)rediction 

Z 


gi|128550781dbj|BAB30210.1| unnamed protein 
nroduct fMus musculusi 


HG100075GN0 160000_gene_predictio 
nl 


gi|26336392|dbj|BAC3188Ll| unnamed protem 
product [Mus musculus] 


HG1001012NO 160000_£ene_piedictio 
nl 


gi|213125u4|reilrNr_0ol554.1| KiKrSJN cuiNA 
28 10432D09 [Mus musculus] 


HG1001237NO_10000 jgenejprediction 


gi|208o29oo|rei|Ar_12ozlo.l| sunuar to 
Hermansky-Pudlak syndrome protein variant 
FRathis norvefficusl FMus musculusl 


HG1000228N0_40000jgene_predictioa 

1 

X 


gil26342390|dbj|BAC34857.1| unnamed protein 
nrcviuct FMus musculusl 


HG1000228N0_20000jgenej)rediction 
1 


gi|135076761ref|NP_109647.1|punimo 1 
fDrosonhila^ FMus musculusl 


HG1000228N0_160000 _^eiiejredictio 


gi|13507676|reflNP_l 09647.11 pumaio 1 
(Drosooliila) fMus musculusl 


HG1000390N0_160000 _jene_predictio 

ni 


gi|20892585jrepP_147977.11 RIKEN cDNA 
2610001E17 fMus musculusl 


HG1000409Np_10000_gene_prediction 
1 


gi|26006245|dbjlBAC41465.1| mKIAA1047 
protein [Mus musculus] 


HG1000611N0_160000_gene_j>redictio 

nl 


gi|6650539|gb|AAF21895.1|AF103877_l 
eDsilon-saicdelvcan FMus musculus] 


HG1000847N0_10000 _£enejpre<iiction 
1 




Wni ftOOfll ^NO 0 oene nredictioill 


gi|204674231refIlSIP_620570.1| chondroitin 
sul&te proteoglycan 4 [Mxis musculus] 


WOI onoORRNft SOOO ffene DTediction 


gi|16741633|gblAAH16619.1| pyruvate kinase 
3 [Mus musculus] 


HG100P143N0 10000 ^enejrediction 
1 


gil20896345|reflXP_128324.1| carbonyl 
reductase 3 [Mus musculus] 


HG1000167N0 5000 ^enejredictionl 


gi|12848663|dbj|BAB28043.1| unnamed protein 
product [Mus musculus] 
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HG1000243N0_5000jgeiie_predictionl 


gi|8393534|refiNP_058653.11 high mobiUty 
protein 17 [Mus musculus] 


HG1000825N0 1 60000 jgenejredictio 
nl 


gi|21311983|reflNP„080956.1| RKEN cDNA 
0610012C01 [Mus musculus] 


HG1001019N0_1000 jgenejMredictionl 


gL|ZOjHJ /05^|auj|i5/\.v^jjj'fi.ij unnainea proiem 
product [Mus musculus] 


HG1000044N0_160000_^eiiej)redictio 
nl 


mil <:n'70'2nQtnKl A AWI 1 AQ^I 1 I Cimilgi- tr* 

gi| 1 ju /yjuy|gu|AArii 1 1 ouniiar xo 
Myosin of the dilute-myosin-V'&mily [Mus 
musculus] 


HG1000100N0_10000^ene_prediction 
1 


gi|4506127jref|NP_002755,ll phosphoribosyl 
pyrophosphate synthetase 1 [Homo ss^ieos] 


HG10d0149N0_160000_igene_piedictio 
nl 


gi|12834813|dbj|BAB23054,l| unnamed protein 
nroduct FMus musculusi 


HGlOOOl 83N0_1000_£pne_piedictionl 


gi|27370150|rcfjNP_766364.1| hypoUietical 
protein D630002G06 \Mxss musculus] 


HG1000183N0_160000 _j^ejpredictio 


gi|27370 1 50|reillSr_7oo3o4. 1 1 nypotnencal 
protein D630002G06 [Mus musculus] 


HGl 0002 1 3N0_5000_genejpredtctionl 


gi|6753178|ref|NP_035923.1| breakpoint cluster 
region protein 1 ; barrier to autointegradon 
fector [Mus musculus] 


HG1000294NO_5000 ^enejpredictionl 


gi|18390327|ref|NP_083908.1| protein 
phosphatase 1, regulatory (inhibitor) subunit 
1 1 ; t-complex testis-e^qiressed 5 [Mus 
musculus] 


HG1000331N0_160000^enej)redictio 


gi|20840824|ref|XP_14103Ll| sinular to sUt 
homolog 1 (Drosopbila); sut (Drosqpmla) 
homolog 1; slit 1. [Homo sapiens] [Mus 

miisciilui?! 


HGl 000391N0_1 60000_gene_predictio 


gi|20887543|reflXP_134475.1| RIKEN cDNA 
2310022B0S IMus musculusi 


HG1000430N0„160000_;genej)redictio 
hi 


gi|26382861|dbj|BAC25510.1| unnamed protein 
product [Mus musculus] 


HG1000597N0 1 60000 jgenejredictio 
nl 


gil263258861dbj|BAC26697.1| unnamed protein 
product [Mus musculus] 


HG1000078N0_5000_genejpredictioal 


gi|zooH-Ojo /|Gi}]|x5/Vv^ooy4Z«i| unnameu. protein 
product [Mus musculus] 


HG1060139N0 5000 eene predicfionl 


hypoflietical protein FLJ13920 [Homo sapiens] 
[Mus musculus] 


HG1000143N0 160000^enej>redictio 
nl 


gi|20896345|ref|XPJ28324.1| carbonyl 
reductase 3 [Mus musculus] 


HG1000162N0 160000^enejpiedictio 
nl 


gi|208357701ref|XPJ32127,l| similar to 60S 
RIBOSOMAL PROTEIN L13 [Mus musculus] 


HG1000168N0 160000 gene piedictio 


^|12841S93|dbj|BAB2S272.1| unnamed protein 
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nl . 


product [Mus musculus] 


HG1000187NO 1 60000 jgeaejredictio 


gi|3599320|gb|AAC72793.1| 0RF2 [Mus 
musculus domesticus] 


HG1000247N0 160000 oene nredicHn 

nl 


musciilus] 


HG1000273N0_160000_j:aie_predictio 
n2 


gi|25030042jreflXP_2073b7.1| similar to 
Retroviius-Tekled POL polypiotein [Mus 


HG100041SN0 lOOOOjgenejpiediction 


gi|9367840|emb|CAB97S23.1| hypothetical 
proxcul, wcaKiy sumiar to {At luzo II) neiiiouai 
apoptosis inhibitory protein 2 [Mus musculus] 
[Homo sapiens] 


HG1000539N0 160000_gene_predictio 
nl 


gi|7521942|pir|lT29096 gag polyprotein - 
murine endogenous retrovirus ERV-L 


HG1000539N0 160000 _jgene_predictio 
n2 


gi|7521942|pir||T29096 gag polyprotein - 
murine endogenous retrovirus ERV-L 


HG1000560N0_160000_genejiredictio 
nl 


gi|12860683|dbj|BAB3202Ll| unnamed protein 
product [Mus musculus] 


HG1000618N0 10000jgene_prediction 
1 


gi|263S0749|dbj|BAC39011.1| unnamed protein 
product [Mus musculus] 


HG1000740N0 160000_£ene_piedictio 
nl 


gi|23601536|ref]XP_130965.2| Nice-4 protein 
homolog [Mus musculus] 


HG1001197N0 160000_£eiie_predictio 
nl 


gi|26327779|dbj|BAC27630.1I unnamed protein 
product [Mus musculus] 


HG1000599N0_5000_gene_predictionl 


gij i zo jor>^z|cioj /u i • 1 1 unnamec protem 
product [Mus musculus] 


xAVjiuuuuzrVi^u ji/uu gene, picuicuoni 


gi|20887101|ref|XPJ29228,l| similar to 
phosphoglucomutase S [Homo sapiens] [Mus 
muscuiusj 


HG1000084N0 5000_gene_predictionl 


gi|6678794|ref|NP_032953.1| mitogen activated 
proiem Kmase lanase i , j\xr\Jr Kmase KUiase i , 
protein kinase, mitogen activated, kinase 1, p45 
[Mus musculus] 


HG1000135N0 5000 kenepredictionl 


gi|21312189|reflNP_081 197.1| RKEN cDNA 
1810010A06 [Mus musculus] 


HG1000169NO 20000 jgene_pi6diction 
1 


gil20886743|refpCP_129211.1| phosphoserine 
aminotransferase [Mus musculus] 


HG1000169N0 160000 trene nredictio 

nl 


aminotransferase [Mus musculus] 


HG1000189N0 160000_jene_predictio 
nl ~ 


gi|20879992|ref|XP_l 40210.11 similar to 
BG:DS01759.1 gene product [Drosophila 
melanogaster] [Mus musculus] 


HG1000189N0 160000_Je!iie_piedictio 
n2 


gi|20879992|ref|XPJ40210.1| similar to 
BG:DS01759.1 gene product [Drosophila 
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• 

HG1000246N0 5000 jgenejredictionl 


gi|21450297|refjNP_659157.1| UDP- 

acetylgalactDsaminyltransfeiase (Mus 
musGxilus] 


HG1000248N0 O_jgenejpieclictionl 


gi|9790219Iref|NP_062745.1| destrin; Sid23p 
Mus musculus] 


HG1060288N0_10000 jgenejiediction 
1 


gi|209095121refpCPJ53447.1| hypothetical 
}ibtein XP_1 53447 [Mus musculus] 


HG1000424N0 5000 eene nredictionl 


gil25031822|ieflXP_207741.1| hypothetical 
iiotein XP_207741 [Mus musculus] 


HG1000443NO 40000_gene_prediction 
1 


gi|26354072|dbj|BAC40666.1| unnamed protein 
Koduct [Mus musculus] 


HG1000590N0_1000_gene_piedictionl 


|1|Z0J /oU570|ClOj|i3Ai3ZOJj7j,Z| UTinnTnCQ. piUtPUU 

product [Mus musculus] 


HG1000626N0 160000 gene predicno 
nl 


gi|9938030|ref|NP_064667,lI hypothetical 
}rotem, JViiNCfO-^iyj, nypouieucai proiem 
MNCb-4193 [Mus musculus] 


HG1000871NO_160000 ^enejiedictio 

Til 

111 


5i|o7 jzyDo|rei|jNir_^uo^ /4z. 1 1 acuvin a 
receptor, tyjpe Il-like 1 ; activin receptor-like 
kinase- 1 [Mus musculus] 


HG1000959N0_10000_genej)rediction 

1 
I 


gi|22507385|reflNP_081019.1| RKEN cDNA 
1110014F12 FMus musculus 1 


HG1000961NO 160000_gene_predictio 
n3 


gi|208229041reflXP_131914.1| RIKEN cDNA 
3110004018 [Mus musculus] 


HG1000974N0 5000_gene_predictionl 


g;i|2o3 /oU70|GDJ(J3AJdZoD73.z| UTinaiTieci piuicm 
product [Mus musculus] 


HG1001045N0_160000_graiej)redictio 
nl 

All 


gi|ZDUZUl3o|rei|-Ax_Zv / /o>.i| oumidr lo 
Retrovirus-related POL polyprotein [Mus 
musculus] 


HGlOOlllONO 0 cene nredictionl 


gi|23956080|reflNP_058675.1| putative 
serine/threonine kinase [Mus musculus] 


HG1001223N0 1000_^ene_predictionl 


gi|26339658|dbjlBAC33500.1| unnamed protein 
product [Mus musculus] 


HG1001281N0_160000_genejpredictio 
nl 


gi|15431279|reflNI'-203538.1| dedicator of 
cyto-kinesis 2 [Mus musculus] 


HGlOOl 317N0_5000^ene_predic}ionl 


gi|26327365|dbj|BAC27426.1| unnamed protein 
product [Mus musculus] 


HG1001485N0_5000_gene_predictiOTil 


otl0#?^0Tl^^lHKilTlAP9749fi 11 unnamed nrotein 
product [Mus musculus] 


HG1000674N0_160000_£ene_predictio 
nl 


gil2421 1881|sp|Q8VCR8|KML2_MOUSE 
Myosin light chain kinase 2, skeletal/cardiac 
muscle (MLCK2) 


HG1001017N0 10000 eene prediction 


ril2501983I|reflXP 207463.11 similar to 
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1 ^ 


CD59B [Mus musculus] 


• 

HG1001017N0 1000 jSpoR jpredictionl ( 


tn 19 QR"^ 1 1rv^flYP 20746^ 1 1 similar to 
CD59B [Mus musculus] 


HG1000014N0_1 eOOOO^genejxedictio 


fr;iA/;cn7/lAlT»>frKrP 0*^1^98 1l ATPnc** Na4-/FC+ 
ZxlOOOU /HH|I\31|X^x X JZfO«-l j /\.XxaoCj x^a.«/xv.' 

xansporting, beta 3 polypeptide; ATPase, 
Naf /K+ beta 3 polypeptide [Mus musculus] 


HG1000043NO_160000 jgene_predictio 
n3 


gi|26337385|dbj|BAC32378,l| unnamed protein 
product [Mus musculus] 


HG1000052N0_160000 _gene_predictio 
nl 


gi|35993201gb|AAC72793.1| ORF2 [Mus 
muscuius aomesncusj 


TTm nnonildNn ^OOO oene nrediction2 


gi|6678794|ieflNP_0329534| mitogen activated 

>T0tem Kmase Kmase i, ivmLr Kiiidsc jllucim; x , 
protein kinase, mitogen activated, kinase 1 , p45 
Mus musculus] 


HGl 000093NO_1 000_gene jredictionl 


gi|26350865|dbj|BAC39069.11 unnamed protein 
product [Mus musculus] 


HG1000105NO 160000_^ene_predictio 
nl 


gi|14198371|gblAAH08247.1| Similar to cyclin 
32 [Mus musculus] 


HG1000157N0 1000 _^enej)redictionl 


gi|5803225jreflNP_006752,ll tyrosine 
3/tryptoplian 5 -monooxygenase activation 
protein, ^silon polypeptide; 14-3-3 epsilon; 
imtocnonanai unport stunuiauon lacior Lr 
subunit; protein kinase C inhibitor protein-1 
[Homo sapiens] 


HG 1 0002 1 ONO_40000^ene ^prediction 

1 


/loUa4U|gD|AAril /Dy JKUSJiiN CUxNA 

5830401B18 gene [Mus musculus] 


HG1000242N0 5000jgene_predictionl 


/oyv^ /|rer|rsi^_uoz /oo.i| lynaj ^^nspw^ 
homolog, subfamily A, member 2; DNA J 
protein [Mus musculus] 


HG1000243N0 5000jgenej)rediction2 


gi|oiyiDo4|rex|rsr_UDcSOj^.i| mgn moDuny 
group protein 1.7 [Mus musculus] 


HG1000256N0_160000^enej)redictio 

nl 
111 


Adenylate kinase isoenzyme 1 (ATP-AMP 
transphosphorylase) (AKl) (Myokinase) 


HGin00279N0 0 £ene nredictionl 


gi|15617203|ref|NP__254279.1| chloride 
intracellular channel 1 [Mus musculus] 


HG1000280N0_5000_gene_j)redictionl 


gi|7106337|ref|NP_034796.1| keratin complex- 
1 , gene C29 [Mus musculus] 


HG1000280N0_5000jgene_pre<iiction2 


gi|7106337|ref|NP_034796.11 keratin oonq)lex- 
1, gene C29 [Mus musculus] 


HG1000282N0_160000^ene_piedictio 
nl 


rrJI9nQn9ftO'^tr<»flYP 1 9R091 1 1 similar to 

Mitochondrial import recq)tor subunit TOM22 
homolog (Translocase of outer membrane .22 
kDa subunit homolog (hTom22) (1C9-2) [Mus 
musculus] 
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HG100u2y2NU loUUUu^ene preoicuo 
nl " 


niA08i^$ilrf>frhJP 0'^7')S^ 1l ribasomal Diotein 
326 [Rattusnorvegicus] 


HG1000313NO_160000 jgene_predictio 

nl 

111 


)hosphatase type IVA, member 1; Protein 
yrosine phosphatase IVAl [Homo sapiens] 


HG1000330N0_20000jgenej>iediction 


gi|22122511|refINP_666146,l| hypothetical 
protein MGC30562 [Mus muscuhis] 


HG1000339N0_160000jgOTejpredictio 

nl 


gi|26350551|dbj|BAC38915.1| unnamed protein 
product [Mus musculus] 


HG1000340N0j60000_gene_predictio 

111 


gi|20912842|refpOP J26689.ll RKEN cDNA 
i300001P08 [Mus musculus] 


HG1000344N0_160000jgenejredictio 
ni 


gi|21450239|ref|NP_659092.11 hypothetical 
protein MGC27983 [Mus musculus] 


HG1000365NO_20000 ^ene_prediction 


gi|25046794lTeflXP_207489.1| similar to KNP 
n^rticle mmnonent PMus musculusl ■ 


HG1000384N0_160000jgene_piedictio 
nl 


gi|20909520|refpCP_126941.1| RIKEN cDNA 
'^fionoi 1C06 PMus musculusl 


HG1000448NO_160000 _gene_predictio 
nl 


gi|6678247|refiNP_033358.1| transcription 
-fiictnr 7-Hke 1 fMus musculusl 


HG1000482N0_160000-genejpredictio 
nl 


gi|26334795|dbjlBAC31098.1| unnamed protein 

nmduct fMus musculusl 


HG1000486NO_20000jgenejprediction 
1 


gi|26350551|dbj|BAC38915.1 i unnamed protein 
nmduct fMus musculusl 

|_XtAUO 1 1 mtTWl****«JJ 


HG1000506NO_160000jgenej)redictio 
nl 


gi|209095201refpaP_126941.1| RIKEN cDNA 
260001 1C06 rMus musculusl 


HG1000518NO 160000 jgenejiedictio 
nl 


gi|263512791dbj|BAC39276.1| unnamed protein 
product [Mus miisculus] 


HG1000550NO 160000 _^genej)redictio 
nl 


5i|2UyuV!)2U|rer|Air_izoy*ri»i| iviAJCfrM i/i^x^rv 
260001 1C06 (Mus musculus] 


HGl 000556N0_160000_gene_predictio 
nl 


gi|25031497|reflXP_207552.11 similar to 
Ketrovmis^reiacea r\ji^ poiyproiem i^iviua 
musculus] 


HG1000588N0 160000^ene_predictio 
. nl 


gi|13277747|gblAAH03768.1| interferon- 
mcuiceG pro^m wiui icunuic^pcputio i^pv^io x 
[Mus musculus] 


HG1000600NO 160000 _gene_pieaicao 
nl 


gi|20863376|refpa*_134148. 1| similar to 
nypomencai proicm [^ivioivaw* looii/ii/ujicuioj lxyxu*» 
musculus] 


HG1000647N0 160000 _gene_predictio 
nl 


regulatory T cell molecule; class I-restricted T 
cell-associated molecule [Mus musculus] 


HG1000648N0 160000 _gene_predictio 
nl 


gi|20900199|reflXP_128639.1| RIKEN cDNA 
2810055C19 [Mus musculus] 


HG1000688N0 160000 gene predictio 


ei|26327707|dbilBAC27597.1| unnamed protein 
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ril . 


product |Mus musculus] 


HG10006Qf)Nn 160000 ffene Dtedictio 
nl 


eil3599320leblAAC72793. 1 1 ORF2 [Mus 

musculus domesticus] 


Hfrl 0nn7RR>Jn l^nnOO (rene nredictio 
nl 


gi|208479121refpCP_144610.1| similar to 
CIA Al 904 nrotein fHomo saoieosl TMus 
musculus] 


nl 


gi|20342176|refpCP_110490.1| simflarto 
ivnntlipti«il nmtein MGC1955 rHomo sabiensl 
|Mus musculus] 


1 


gi|67533241re£|NP_033968.1| duqjeronin 
QiiHiinit fiSK (T^tsiS* chaneTonin containing TCP-1 
Mus musculus] 


n2 


gi|6753324|ref|NP_033968.1| chapeionin 
QiiVtimif f>sK (*ipiiVkV diflnemnin containing TCP-1 - 

[Mus musculus] 


HG1000902N0 1000 nredictionl 


subunit 6a (zeta); chaperonin cdntaining TCP-1 
[Mus musculus] 


HG1000904N0_160000_£BMi_predictio 
nl 


gij3599320|gb!AAC72793,l | ORF2 [Mus 
musculus domesticus] 


HG 1 000966N0 1 000_gene_predictionl 


gi|22122617|ieflNP_666215,ll hypothetical 
protein MGC25S11 [Mus musculus] 


HG1000966N0 5000jgene_predictionl 


gi|22122617|refINP_66621 5, 1| hypothetical 
protein MGC2551 1 [Mus musculus] 


HGl 000994N0_160000_£ene_predic5tio 
nl 


gi|12855175|dbj|BAB30238.1| unnamed protem 

product [Mus musculus] 


HG1001014N0_160000_jene_predictio 

n3 


gi|26337385|dbj|BAC32378.1| unnamed protein 
product [Mus musculus] 


HG1001041N0 5000_£ene_predictionl 


gi|25071304|i:eflXPJ464973| similar to 
protein serine kinase Pskhl [Mus musculus] 


HG1001337N0 160000 ^ene_predictio 
nl 


gi|27369704|reflNP_766096.1| hypothetical 
protein 6030499008 [Mus musculus] 


HG1001417N0 5000_gene_predictionl 


gi|26349767|dbj|BAC38523,li unnamed protein 
product [Mus musculus] 


HG1001485N0 160000 .^gene_predictio 
nl 


gi|7513636Ipirl|T30805 duttl protem - mouse 


HG1000151N0 160000_gene_predictio 
nl 


gi|180443281gb|AAH19573,ll Unknown 
(protein for IMAGE:3990036) [Mus musculus] 


HG1000330N0 160000_jene_predictio 
n3 . 


gi|2502981 l|refpCP_207217,l| sunilar to ORF2 
[Mus musculus domesticus] 


HG1000957N0 20000jgene_prediction 
1 


gi|25024769|reflXP_207136.1| similar to ORF2 
[Mus musculus domesticus] 


HG1000960N0 O_gene_predictionl 


gi|20908689|rc£pCP„127449.1| RKEN cDNA 
4632401C08 [Mus musculus] 
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HG1000960N0_0^enej)j:ediction2 


gi|20908689lrcflXP_127449.1| RIKEN cDNA 
4632401C08 (Mus mnsculus] 


HG1001280N0_20000jgenejpiediction 


gi|26336763|dbj|BAC32064.1| unnamed protein 
product (Mus musculus] 


HG1001502N0_160000^eneji«dictio 
nl 


gi|27370240|reflNP_766415.1| hypothetical 
in>tein4732490P18 (Musmusculos] 


HG1000003N0_10000jgeiiejwredictioii 
1 


gi|136243051refjNP_112440.1| procollagen, 

Ivoe n alnha. 1 FMus miisciiliisl 


HG1000041NO 160000_jgeaQe_predictio 

m 


gi|26390169|dbj|BAC2S8S4.1| lumamed protein 
^loduct [Mus musculus] 


HG1000043N0 loOOOO _^gene_preaictio 
n2 


j^|2o3373o3|aDj|i7AC3237oa| unnamed protein 
product [Mus musculiis] 




jgi(15u79309|gb|AAHll494.i| Siniilar to 
Slyosin of the dilute-myosin-V family [Mus 

miisnilii^l 

III llfTVilX W>J 


HG1000051N0_160000j^e_predictio 

nl 

HI 


gi|14250190|gblAAH08515.1| interferon 


HGl 000057NO_1 60000_gene_predictio 


gi|6755040|reflNP_035202.1| profilin 1; actin 

V\iTirlincr nrotf»in fA^iic tYnic^^iilnd 


HG1000060N0_160000_genej)redictio 

111 


gi|6755901|reflNP_035783.1| tubulin, alpha 1; 


HG1000061N0_10000^enej)iediction 
1 

X 


gi|20827552|refpa>_130234.1| expressed 

Qfvni<*nf*^ AW^1rt7^1 PM^iiq mii^fiiiln^l 


HG1000079N0_160000^jene_predictio 

nl 


gi|20887309lref|XP_129200.1| adenylate kinase 

flInVifl lilcp n^nc nni<if!iilii<jl 

J CU^XXOr XXA.W l^lVXUd lllUoV/UXUoJ 


HG1000098NO_160000 _^ene_predictio 

nl 
ill 


gi|26340666|dbj|BAC3399S.l| unnamed protein 

nrndiict FM^iiq TTiiiQRii1ii<t1 


HGlOflOlO^NO soon frenp nre/lictionl 


gi|12850600|dbj|BAB28785,l| unnamed protein 

nmdii ct rMii<5 miiof^iilii^l 

^XUIXUV/V ^XTXUO XXXiXOV^VtXUAJ 


HG1000121NO_160000_gene j)redictio 
nl 


gi|26346402|dbj|BAC368S2.1| unnamed protein 
ntoduct FMus musculusl 


HG1000131N0_16()000.^gene_predictio 
nl 


gi|26329183|dbj|BAC28330.1| unnamed protein 

nioduct FMus musculusi 


HG1000134N0_160000_gCTe_predictio 
nl 


gi|12860377|dbj|BAB31934.1| unnamed protein 

nroduct FMus musculusi 


HG1000134N0_160000^ene_predictio 
n2 


gi|12860377|dbj|BAB31934.1| unnamed protein 
Dioduct FMus musculusi 


HG1000136N0 160000_fiene_predictio 
nl 


giI26389519|dbj|BAC25745.1| unnamed protein 
product [Mus musculus] 


nl 


gx|«/ / X f 7 / o|vXliU|\.yjriLr^/ •'v^x*x| iiwdvxxicu 

protein [Mus musculus] 


HG1000166NO 160000_jene_predictio 
nl 


giI20908717|ref|XPJ27445.1| similar to 
flavoprotein subunit of succinate-ubiquinone 
reductase [Rattus norvegicus] [Mus musculus] 
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HG1000172N0_1000^nejpredictionl 


gi|6681095|ieflNP_031834.1| cytochrome c, 
somatic [Mus musculus] 


HGl 000 1 72N0_1 OOO^ne jiediction2 


gi|6681095|ref|NP_031834.1| cytocbrame c, 

somatic PMus musculusl 


HG1000175N0_5000^enej)redictionl 


gi|263S42161dbj|BAC40736.1| unnamed protein 
Dioduct INfus muscidiisl 


HG1000175N0 10000_genej)iediction 
1 


gi|263S4216|dbj|BAC40736.1| unnamed protein 

niioduct fMufs TniisciiliiO 

JL^A ^/VlXAWt. 1 XTXUO XXIUOX^UXLIO 1 


HG1000175N0 160000_geiiejpi^ctio 
nl 


gi|26354216|dbj|BAC40736.1| unnamed protein 
product [Mus musculus] 


HG1000175N0_1000_genejpredictioiil 


gi|26354216|dbj|BAC4073o.l| unnamed protein 
product [Mus musculus] 


HG1000192N0_160000_gene_predictio 
nl 


gi|10946614|ief|NP_067287.1| WD repeat 
domain 12; nuclear protein Ytml [Mus 

iXxUoVUJlUoJ 


HG1000193N0_160000 ^jpnejjiedictio 


gi|21728370MNP_080178.1| RIKEN cDNA 


HG1000195N0_160000_gene_predictio 
nl 


gi|17390530|gb|AAH18231.1| Unknown 
yjjAJicxii lui ivivjv-/. i^zjuy i^ivius inuocuiusj 


HG1000197N0_160000^enej)redictio 
nl 


gi|21450185|TeflNP_659063.1| hypothetical . 

pruicin jvxvjv^zoioQ [^iViUd muscuiusj 


HGl 000202N0_20000^ene_prediction 
1 


gi|26331946|dbj|BAC29703.1| unnamed protein 
pruuuci i^ivius niuocuiusj 


HG1000210N0_20000_genejrediction 
1 


gi|17160840|gb|AAH17597.1| RIKEN cDNA 


HGl 0002 18N0 1000 aene prediction 1 


gi|6681015|ref|l>IP_031789.1| cysteine lidi 

XixwaUiiOl pilJtdJU |_lVlUo IllUaWrUlUoJ 


HG1000218N0_160000^enej)redictio 
nl 


gi|6681015(reflNP_031789.1| cysteine rich 

XlXlwoUliOl pilJlCill l^xVxUo XUUoV/lUUoJ 


HG1000218N0 10000_genejprediction 
1 


gi|66810151reflNP_031789.1| cysteine rich 

mtefstiTijil nrntAiti nV/Tiic miicdiliid 


HG1000222N0 1000 eene nredictionl 


gi|133850541ref|NP_079873.1| iOBCEN cDNA 
2700033116 fMus musculuO 

^ ' \/yJ\JJJXX\J |_XtXUo XXXUOwUXlXD I 


HG1000233N0 1000_gene_pre<iictionl 


gi|12847362|dbj|BAB27541.1| unnamed piotem 
nroduct rN/fiis miisciiliisl 

J^AWIU\/I> IXTXUO XlXUo^UXUOl 


HG1000234N0_lG00^enejredictionl 


gi|12847362|dbj|BAB2754Ll| unnamed protein 
product [Mus musculus] 


HG1000234N0_160000jgene_predictio 
ni 


gi|12847362|dbjlBAB27541.1| unnamed protein 
product [Mus musculus]- 


HG1000238N0 160000jgene_predictio 
n2 


gXjUU / 1 JHi^ireXlIN X 1 H / y . H dilU-UAlUiUlL 

protein 2; acidic calciimi-independent 
phospholipase A2; peroxiredoxin S; 1-Cys Prx 
[Mus musculus] 


HG1000240NO 160000 eene piedictio 


gil26328673|dbi|BAC28075.1| unnamed protein 
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HG1000245NO_160000jBpne_piedictLo 
nl 

HG1000245N0 5000 _gene predictionl 



Fantom Top Hit Annotation 
product [Mus mnscuhis] 



gill2850132ldbj|BAB28604.1| imnamedpioteini 

product [Mus musculus] 

gi|12850132ldbj|BAB28604.11 unnamed protei 
product [Mus mnscolus] 



sin 



HG1000249N0_10000jgene_prediction 



gi|6754654|ref|NP_034905.1| mamiose binding 
ectin, Uver (A) FMus musculus] 



HG1000251N0_160000jgene_ptedictio 
nl 



HG1000i2S2NO 
HG1000254N0 
nl 



5000_gene predictionl 
160000.^^e_piedictio 



nl 



[G1000262N0_160000_gene_predictio 



gi|20881913MXP_126211.1| Dullard 

homolog [Mus musculus] 

gi|20825536MXP J29507.ll ring finger 

protdn 2 [Mus musculus] 

gill3385058|reflNP_079878.11 hypothetical 
protein mOEitd718e [Mus musculus] 

1 /nrr. not^lOl 1 1 QTIirmi 



gil21312163|refiNP_082683.1| RIKEN cDNA 
29000S4P12 [Mus musculus] 



[G1000264N0 1000 _^e predictionl 



HG1000264NO 
HG1000270N0 

HG1000270NO 
HG1000274N0 
nl 



1000_£eae prediction2 
20000 jgene_prediction 

lOOOjgene predictionl 
I60000_gene_predictio 



HG1000276N0_160000_genejpredictio 
nl 



HG1000276N0 5000_gene predictionl 



gil21624617|refiNP_081018.11 RIKEN cDNA 

1 1 10007M04 rMus musculus] ; 

gil21624617|reflNP_081018.11 MKEN cDNA 

1 1 10007M04 fMus musculus] 

gi|12844196|dbjIBAB26273.1| unnamed protdn 

product [Mus musculus] 1 

gi|12852884|dbj|BAB2?566.11 unnamed protein 

product [Mus musculus] . _J 

gi|26347831|dbjlBAC37564.1| unnamed protein 

product [Mus musculus] . • 

gi|19527228|reflNP_598768.1| DNA segment; 
Chr 10, ERATO Doi 214, expressed [Mus 

musculus] 

gill95272281reflNP_598768.1| DNA segment, 
Chr 10, ERATO Doi 214, expressed [Mus 
musculus] 



HG1000278N0 5000 ^ene predictionl 



HG1000280N0_160000_^ene_predictio 
nl 



HG1000280N0_1000^ene predictionl 



gill9527026|ref|NP_598568.11 expressed 
sequence AA959742 [Mus musculus] 



u gi|7106337|reflNP_034796.1| keratin con^lex- 
1 , gene C29 fMus musculus] 



b^*^^ ^ — »>" ~ • 

gi|71063371ref|NP_034796.1| keratin complex- 
1, ^e C29 fMus musculus] 



HG100028dN0_160000_gene_predictio 
n2 



HG1000280N0 
HG10OO3O5N0 



1000_genei; prediction2 
_S000_^ene' predictionl 



gi|7106337|iefiNP_034796.1| keratin complex- 
1, gene C29 [Mus musculus] 

gi|7106337|iefll^-®^^''^^-^' ^^^^ complex- 

1 . gene C29 [Mus musculus] . 

gil27369902]ref|NP_766218.1| hypothetical 
protein A530095G11 [Mus musculus] 
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HG1000305Nb 5000jgene_predictioii2 p 


j|27369902|ref|NP_766218.1| hypothetical 
rotein A530095G1 1 [Mus musculus] 


i 

NG1000307N0 160000 eene predictio c 
nl • 


j[|8393853|ref|NP_058614.1| nudix (nucleoside 
iphosphate linked moiety X)-type motif 5 

* * 

Mus musculus] 


Hmnfi0'^34N0 160000 eene oiedictio 


ril20888553treflXPJ34832.1| similar to 
>robable serine/threonine protein kinase 
5NF1LK [Mus musculus] 


• 

HG1000335N0 160000 jgene_preclictio 1 
nl ' 


^|20888553|refpCP_134832.1| similar to 
ihrobable serine/threonine protein kinase 
3NF1LK [Mus musculus] 


■ 

HG1000337N0 5000_gene_piedictioiil 


gi|12851918|dbj|BAB29207,ll unnamed protem 
product [Mus musculus] 


HG1000343N0_160000_gene_predictio 
nl 


gi|35993201gblAAC72793.1| ORF2 [Mus 
musculus domesticus] 


HGI000343N0_160000j^_predictio 
n2 


gi|13386340|refIlSlP_083008,l| RIKEN cDNA 
4632428N0S [Mus musculus] 


HG1000369NO_160000 _gene_predictio 
nl 


gi|12837873|dbj|BAB23982.1| unnamed protein 
product [Mus musculus] 


HG1000372NO_160000 .^genejredictio 
nl 


gi|209139471ref|XP_126555.1| RIKEN cDNA 
1 1 90006Kd 1 [Mus musculus] 


HG1000378NO_160000 _gene_predictio 
nl 


gi|26348995|dbjlBAC38137.1| unnamed protein 
product [Mus musculus] 


HG1000387NO_160000 j6ene_predictio 
nl 


gi|35993201gb|AAC72793,l| ORF2 [Mus 
musculus domesticus] 


HG1000387N0_160000_gene_predictio 
n2 


gi|263828611dbj|BAC25510.1| unnamed protein 
product [Mus musculus] 


HG1000397N0 5000 gene_predictionl 


gil20836469|reflXP_129717.1| hypothetical 
protein XP 129717 [Mus musculus] 


HG1000408NO_1 60000 ^ene_predictio 
nl 




HG1000414NO_160000 ^ene_predictio 
nl 


gil35993201gblAAC72793,l| 0RF2 [Mus 
musculus domesticus] 


HG1000431N0 160000 jgene_preclictio 
nl 


atiR^0dns7lreflNP 058565.11 low density 
lipoprotein receptor-related protein 4; low 
density lipoprotein-related protein 4; Low 
Density Lipoprotein Receptor Related Protein 
4; corin [Mus musculus] 


HG1000439N0 160000 ^e_piedictic 
nl 


» gi|128519181dbjlBAB29207.11 unnamed protein 
product [Mus musculus] 


HG1000449N0 20000 jgene_predictioi 
1 


gil25025117|reflXP_207206,l| similar to 
transcription fector-like nuclear regulator, 
1 putative transcription regulation nuclear 
protein; putative transcription fector-like 
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< 


luclear regulator, TATA box binding protBin j 
TBP)-associated factor, KNA polymerase m, 
3TF3B subunit 1 ; ... |Mus musculus] 


HG1000457NO_160000 .^e_predictio 
nl 


jH20824761|repP_133346.11 liver-specific 1 
iSLK-Zip transcrqjtion factor [Mus musculus] 


HG1000458N0_160000 _£ene_predictio 
nl 


Ti|12841242|dbj|BAB25129.1| unnamed protein 
jroduct [Mus musculus] 


HG10P0461N0_160000 _jenejpredictio 
nl 


gi|25032310|ref|XP_205729.1|hypotheticja | 
protein XP 20S729 [Mus muscidus] 


HG1000463N0_160000_jenej)redictio 
nl 


gi|12861068|dbj|BAB32114.1| unnamed protein 
product [Mus musculus] 


HG100()463N0_160()00jgaie_predictio 
n2 


gi|13249351|refINP_076402:l| inositol- 1 
requiring 1 alpha (yeast) [Mus musculus] 


HQ1000476N0_160000 .^genej)redictio 
nl 


gi|26332657|dbj|BAC30046.1| unnamed protein 
(Hoduct ^us musculus] 


HQ1000481N0_160000 _genej)iedictio 
nl 


gi|21311873|refINP_077181.1| RKEN cDNA 
)610007A03 [Mus musculus] 


nl 


eil20860491lfefiXP 153755.11 hypoflietical 1 
protein XP 153755 [Mus musculus] 


HG1000556Np_160000jgeiie_predictio 
ii2 


ffil25031497lreflXP 207552.11 similar to 1 
Retrovirus-ielatedPOLpolyprotein[Mus | 

musculus] 


HG1000584NO_160000 jgpnejredictio 
nl 


gi|27370500|Tef|NP_766581.11 hypothetical 
protein D230008H22 [Mus musculus] 


HG1000587NOJ60000 _^ej)redictio 
nl 


gi|23682449|reflXP_158842.21 hypothetical 
protein XP 158842 [Mus musculus] 


HG1000592N0_16(K)00 _genej)redictio 
nl 


gi|263495991dbj|BAC38439,l| unnamed protein! 
product [M\is musculus] 


HG1000594NO 160000_genejpredictio 
nl 


gil22095015|re£iNP_084065.1| RKEN cDNA 
0610013117 [Mus muisculus] 


HG1000594N0 160000^enejredictio 
n2 


gi|22095015|ref|NP_084065.1| RKEN cDNA 
0610013117 [Mus musculus] 


HG1000608N0 160000_gene_predictio 
nl 


^ion'j/i<cO'0Hwfnrp lrt077R 11 Qimilarto 1 

gJ|ZUo4jZ-Zj|rei|A»r__lv5' / IO*x\ annual IV 1 

Neurabin-H (Neural tissue-specific F-actin 
binding protein II) (Protein phosphatase 1 
regulatory subunit 9B) (Spinophilin) (pl30) 
(PPlbpl34) [Mus musculus] 


wni rionfil S>JO l ^OnOO aene nredictio 

Xlvjl UVViO 1 JiN Vl l\j\J\J\J\J 14^^^ 

nl 


gi|7710032|ref|NP_057928a| growth fector 
receptor bound protein 14 [Mus musculus] 


HG1000620N0 160000_gene_pi:edictio 
nl 


gil250524621reflXPJ3810531 similar to TAR 
DNA-binding protein-43 (TDP-43) [Mus 
musculus] 


HQ1000621NO 16000O.jene_predictic 
nl 


) gi|3599320|gblAAC72793.11ORF2[Mus 
musculus domesticus] 



189 



wo 



2004/020595 



PCTAJS2003/027107 



FPID . 


Fantom Top Bit Annotation 


HQ1000621N0 160000_gene_predictio 
n3 . 


gi|26382861|dbj|BAC2SS10.1| unnamed protein 
nrodiici rVTu^s mii^ciiliiO 


HGl 00063 INO 40000_£ene_prediction 
1 


gi|6681283|ref|NP_031938.1| epidennal growtli 
fectoT receptor; avian erythroblastic leukdnia 

vinsl rv^i4v-.h^ f\nn/\{Tf>n^ lirminl/^o FKifiic 
viioi yy^iij-x/j Uiiv^gcuc UvlIUi/XU^ |^lVlUo 

musculus] 


HG10O0652N0 160000_jene_predictio 
nl 


eadcmuclease/rcverse transcriptase (Mus 
musculus] 


HG1000663N0 160000,:£ene_predictio 
nl 


gi|20915416|i«fpCPJ62987,l| hypothetical 
protein XP_1 62987 (Mus musculus] 


HG1000686N0 160000_jgene_predictio 
nl 


gil3599320|gb|AAC72793.1| ORF2 [Mus 
musculus domesticus] 


HG1000700NO 1 60000 jgenejpredictio 
nl 


gi|16508047|gb|AAL17972,l| pORF2 [Mus 
musculus domesticus] 


HG1000701N0 160000_gene_predictio 
nl 


gi|26327167|dbj|BAC27327.1| unnamed protein 
product [Mus musculus] 


HG1000709N0 160000 .^ejwedictio 
nl 


gi|220579|dbj|BAA00448.1| open reading 
fiame (196 AA) [Mus musculus] 


nl 


gi| 1Z041 ozo|aDj|i5AJt3z jioo. 1 1 unnamed protem 
product [Mus musculus] 


HG1000720N0 160000_jene_predictio 
nl 


gi| ioj /'fiD|ier|]Ni'_U3Dyoo.Z| odd u^ten-m 
homolog 2 (Drosophila); odd Oz/ten-m 
homoloc 3 rDrosonhila"^ IMus musculusl 


HG1000727N0 160000_gene_predictio 
nl 


gi|2633564.5|dbj|BAC3 1523. 1 1 unnamed protein 
product [Mus musculus] 


HG1000743NO 160000_gene_predictio 
n2 


gi|26338834|dbj|BAC33088.1| unnamed protein 
product [Mus niusculus] 


HQ 1 000767N0_5000_genej)redictioni 


gi|12851918|dbjlBAB29207.1| unnamed protein 
product [Mus musculus] 


HG1000786N0 160000_genejiredictio 
n2 


gij6678303|ref|NP_033386,l| transcription 
i<u^ lur /\, miiocnonanai [Xvius muscuiusj 


HG1000822NO 160000 eene nredictio 
nl 


gi|6680195|ref|NP_032255.1| histone 
deacetylase 2; DNA segment, Chr 10, Wayne 
ouuc uiuvciikiiy 1/^9 CApressea i^jvius 
musculus] 


HG10O0829N0 160000_jene_predictio 
nl 


^'l914Sni 50lrpfl>JP ^SOOAQ ll/^nXTA c^mim/^^ 

s^I^at^jvi jjf|jroi|pir__ujj^u*Ty. i| cl^in/tl sec^ucnce 
BC024131; hypothetical protein MGC37896 
[Mus musculus] 


HG1000848N0 160000^ene_piedictio 


gi|26350995|dbj|BAC39134.1| unnamed protein 
product [Mus musculus] 


HG1000860N0 160000 _gene_predictio 
nl 


gi|26325678|dbj|BAC26593.1| unnamed protein 
product [Mus musculus] 


HG1000898N0 10000 gene tnediction 


Ri|21450209|reflNP 659075.11 hypothetical 
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1 


protein MGC2S509 [Mus musculus] 


HG1000898N0 160000 ^ene_predictio 
nl 


gi|21450209|reflNP_659075.1| hypothetical 
protein MGC25509 [Mus musculus] 


HG1000898N0 20000jgene_pEediction 
1 


gi|21450209|refjNP_659075.1| hypothetical 
protein MGC25509 [Mus musculus] 


HG1000902N0 1 60000 jgene_predictio 
nl 


eil214S0209lieflNP 659075 11 hvoothetical 
protein MGC2SS09 [Mus musculus] 


HG1000904N0 160000_gene_predictio 
n3 


2il6753324lreflNP 033968 11 diaoeroniTi 
subunit 6a (zeta); chaperonin contaiiiing TCP-1 
[Mus musculus] 


HG1000906N0 20000jgene_prediction 
1 


gil20344324|reflXPJ09683.1| RKEN cDNA 
1810027010 [Mus musculus] 


HG1000906NO 160000_£enejpredictio 
nl 


gi|26346114|dbj|BAC36708.1| unnamed protein 
product [Mus musculuis] 


HG1000921N0_5000_jgene_predictionl 


gi|26346114|dbj|BAC36708,l| unnamed protein 
product [Mus musculus] 


HG1000938N0 10000_gene_prediction 
1 


gi|26350775|dbj|BAC39024.1| unnamed protein 
product [Mus musculus] 


HG1000952N0 160000_gene_predictio 
nl 


gi|26339054|dbj|BAC33198.1| unnamed protein 
product [Mus musculus] 


HG1000961N0 160000 eene nredictio 
nl 


mllSQQ'^90lahlAAr'797O'^ 11 OPTfO rAyfuc 
I iyD,L\ vlJNXZ l^iVLUS 

musculus domesticus] 


HG1000961N0 160000 eene medictio 
n2 


gi|25051287|refIXP_146665.3| sinularto 

fCTA AO^^V? 'nn>tf>in n-T^mn caniAncI rii^iie 
xvLn^vo / / piv/iciu ^flUlUvl oapiCilSJ l^iYLUo 

musculus] 


HGIOOIOOONO 160000_gene_predictio 
n2 


eukaiyotic initiation factor 5 [Rattus 

norvegicus] [Mus musculus] 


HG1001003NO 160000_gene_predictio 
nl 


gi|19527072|iefINP_598613.1| expressed 
sequence AW555139 [Mus musculus] 


HG1001007N0 160060_^e_predictio 
nl 


gi|13277825|gb|AAH03796.1|Similarto 

ymphocyte specific I [Mus musculus] 


HG1001009N0_0_gene_predictionl 


gi|26334641|dbj|BAC31021.1| unnamed protein 
product [Mus musculus] 


HG1001014NO 160000 ^gene_predictio 
n2 


gi|26329567|dbj|BAC28522.1| unnamed protein 
product [Mus musculus] 


HG1001017NO 40000_gene_prediction 
1 


gi|26337385|dbj|BAC32378.1| unnamed protem 
product [Mus musculus] 


HG1001017N0 20000^ene_prediction 
1 


gi|25019831|i:eflXP_207463.1| similar to 
CD59B [Mus musculus] 


HG1001144N0 160000_£ene_predictio 
nl 


gi|25019831|reflXP_207463.1| sunilar to 
CD59B [Mus musculus] 


HGlOOl 172N0 1 60000_^ene_predictio 
n2 


gi|3599320|gb|AAe72793.1| ORF2 [Mus 
musculus domesticus] 
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HG1001214N0 20000_genejpredictioii 
1 


gi|26340706|dbj|BAC3401S.l| unnamed protein 
product {Mus musculus] 


HG1001229N0 160000^ene_predictio 


• 


HG1001253NO 160000_gene_predictio 
nl 


gi|35993201gblAAC72793.1| 0RF2 [Mus 
musculus domesticus] 


HG1001253N0 160000 ^e_predictio 
n2 


gi|263262Sl|dbj|BAC26869.1| umiamedpiotem 
product [Mus musculus] 


HG1001267N0 160000_geDe_predictio 
nl 


gi|263262511dbj|BAC26869.1| umiamed protein 
product [Mus musculus] 


HG1001289N0 160000_gene_predictio 
nl 


gi|3599320|gb|AAC72793.1) ORF2 [Mus 
musculus domesticus] 


HG1001343N0 10000_gene_piediction 
1 


gi|26333317|dbj|BAC30376.1| unnamed protein 

nroduct FMiir miismiliYQl 


HG1001343NO 160000_gene_predictio 
nl 


gi|6755060|re£|NP_035214.1| 
phosphatidylinositol 3-kinase, C2 domain 
containing, gamma polypeptide [Mus 

mncr'nliid 


HG1001390N0 160000_gene_predictio 
nl 


gi|6755060|refINP_0352 14. 1| 

containing, gamma polypeptide [Mus 
musculus] 


HG1001468N0 160000 .jene_predictio 
nl 


gi|6680083|ref|NP_0321 89.11 growth factor 
receptor bound protein 2 [Mus musculus] 


HG1001508N0 160000_gene_predictio 
n2 


gi|25030495|ref|XP_205178.1| smularto 
bA130N24. 1 (novel protein similar to REV3L 

of DNA polymerase zeta) (POLZ)) [Homo 
sapiens] [Mus mxisculus] 


HG1000084NO 160000 ^ehe_predictio 
nl 


gi|26382861|dbj|BAC25510.1| unnamed protein 
product [Mus musculus] 


HG1000084N0 160000_gene_piedictio 
n2 


gi|25031822|ieflXP_207741.1| hypothetical 
protein XP_207741 [Mus musculus] 


HO1000209N0 160000 eene niedictio 
nl 


eil25031822lreflXP 207741 11 hvnnthetical 
protein XP_207741 [Mus musculus] 


HG1000382N0 160000_gene_predictio 
nl . 


gi|20858167|ieflXP_125585.1| similar to 
PTD013 protein; CGI-24 protein [Mus 
musculusl 


HG1000591N0 160000_£ene_predictio 
nl 


gi|6678716|ref[NP_032539.1| low density 
ipoprotein receptor-related protein S: low 
density lipoprotein-related protein 5 [Mus 
musciilus] 


HG1000904N0_160000_^ne_predictio 
n4 


gi|26330005|dbj|BAC28741.1| unnamed protein 
product [Mus musculus] 
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HG1000005N0 160000_gene_predictio 
nl 


gi|20835832|refiXP_129684.1| complement 
lecqitor 2 [Mus musculus] 


nl 


gip jyyjzu|gD|/v/\\-> /z /yD,i \ kJjstz, (JVLUS 
musculus domesticus] 


HG1000015N0 160000_jene_predictio 
nl 


gijooou /44|rei|rNr_u J xozo. 1 1 A i rase, rNa+/JvT 
transporting, beta 3 polypeptide; ATPase, 
Na+/K+ beta 3 DolvDeDtide IMus musculusl 


HG1000015N0 20000_gene_prediction 
1 


gi|20467423|teflNP_620570.1| chondioitin 
sulfate nroteooivcan 4 fKfus musciilii55l 


HG1000015N0_5000_gene_j)redictionl 


gi|20467423|ief|NP_620S70.1| chondiDitin 
sul&te nrbteofiivcan 4 fMus musculusl 


HG10O0O15N0 160000_genejpredictio 
n2 


gi|20467423|ref]MP_620570.1| chondioitin 
sulfate nroteo^lvcan 4 PMur musculiisl 


HG1000020N0 l«)000_gene_predictio 


gi|20467423|ref|NP_620570. 1 1 chondroitin 
sid&te proteoglycan 4 [Mus mnsculus] . 


HG1000020N0_5000jgene_prediction2 


gi|26330706|dbj|BAC29083.1| unnamed protem 
product [Mus musculus] 


HG1000024N0 10000_^esie_prediction 
1 


gi|20887101|refpQP_129228.1| sunilar to 
phosphoglucomutase 5 [Homo sapiens] [Mus 
musculus] 


HG1000026N0 160000_^eaie_predictio 
nl 


gi|12853786|dbj|BAB29848.1| unnamed protein 
product [Mus musculus] 


HG1000030NO 160000^enejpredictio 
nl 


gi|9506367|i«f|NP_062425.I| ATP-binding 
cassette, sub-family B, mraibcr 10; ATP- 
binding cassette, sub-family B (MDR/TAP), 
membCT 12; Abc-mitochondrial erytbroid [Mus 

musculusl 


HG1000039N0 160000_£ene_predictio 
nl 


gi[26006203|dbj|BAC41444.1| mKIAA0696 

nrotfiin PMus Tiriu5?culusl 


HGl 000041N0_5000_gaiejprediction 1 


gi|7106453|reflNP_035897.1| zinc finger RNA 

bindine: nrotein fMus musculusl 


HG1000043N0 160000L£ene_predictio 
nl 


gi|26390169|dbj|BAC25854.1| unnamed protein 
product [Mus musculus] 


HG1000043N0_5000 jgene_predictionl 


gi|26337385|dbj|BAC32378.1| unnamed protein 
product [Mus musculus] 


xivjiVA/uu'HfiNu zwui/^Kcne preQicuon 

i 


gi|zo^j/joD|aDj|r>/\i^JZ3/o.i| unnamea pFOtem 
product [Mus musculus] 


HG1000052N0 160000_genejprediictio 
n2 


gi|i ju/yjuy|go|AArii I'fy^.il oimuarto 
Myosin of flie dilute-myosin-V family [Mus 
musculus] 


HG1000052N0 10000_geiie_prediction 
1 


gi|26324852|dbj|BAC26180.1| uimamed protein 
pixxiuct [Mus musculxis] 


HG1000052N0 20000jgene_j)rediction 
1 


gi|263248S2]dbj|BAC26180.1| unnamed protein 
product [Mus musculus] 
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HG1000058N0 10000 ^genejwediction 
1 


gi|26324852|dbj|BACa6180.1| unnamed protein 
product [Mus musculus] 


HG1000061N0_5000_jene_predictionl 


gi|3599320|gbIAAC72793.1| ORF2 IMus 
musculus domesticus] 


HGl 000065N0_5000_gene_j>i©dictionl 


gi|5031571|ieflNP 005713 11 actm-related 
protein 2; ARP2 (actin-relaled piotein 2, yeast) 
homolog [Homo sapiens] 


HG1000065N0 10000_£eiie_prediction 
1 


gi|13386220tref|NP_081610.1| RlKENcDNA 
2210414H16 [Mus musculus] 


HG1000065N0 160000_jgenejpiedictio 
nl 


gi|13386220|reflNP 081610, 1| RlKENcDNA 
2210414H16 [Mus musculus] 


HG1000068NO 160000_gene_piedictio 
nl 


gi|13386220|refINP_081610.1| RIKEN cDNA 
2210414H16 [Mus musculus] 


HGl 000070N0_0_geiie_predictionl 


gi|26326191|dbj|BAC26839.i| unnamed protein 
product [Mus musculus] 


HG1000073N0 20000_^ene_piediction 
1 


gi|21595527|gb|AAH32275,l| Similar to 
receptor-like tyrosine kinase [Mus musculus] 


HG1000075N0 160000_gene_predictio 
nl 


gi|26326407|dbj|BAC26947.11 umamed piotein 
product [Mus musculus] 


HG1000076N0 160000_genejpredictio 
nl 


cil3599320lffblAAr*797Q'^ 11 nPi79 nv>riic 
musculus domesticus] 


HG1O00081N0 160000 .^ejjredictio 
nl 


Cil4502549lreflNP 001 7*^4 1 1 r^^ltnnHnliTi 0 

(phosphorylase kinase, delta); phoq>l]orylase 
kinase delta [Homo sapiens] 


HG1000106NO 160000_jenejpiBdictio 
nl 


gi|6680305|reflNP_032328,l| heat shock 
urotein. 84 kDa i fMm miicriilncl 


HGlOOOlOTNO 160000_gene_predictio 
nl 


gi|6681225|ref|NP„031905,l| developmentally 
regulated GTP binding protein 1 ; 
develonmentallv regulated GTP-hinHiTio- 
protein 1 [Mus musculusi 


HG1000109N0_0jgene_predictionl . 


ei|6754774lieflNP 034986 11 mvosin beaw 
chain, cardiac muscle, adult; alpha cardiac 
MHC; alpha myosin [Mus musculus] 


HG1000112N0 160000_£ene_predictio 
nl 


gi|23956080|refINP_058675.1| putative 
serine/threonine kinase [Mus musculus] 


HGl 0001 16N0 160000_jenej)redictio 
nl 


gil3599320|gb|AAC72793,l| 0RF2 [Mus 
musculus domesticus] 


HG1000126N0 160000_gene_predictio 
nl 


gi|6680305|refINP 032328.11 heat shock 
protein, 84 kDa 1 [Mus musculus] 


HG1O0O130N0 160000_gene_pre<iictio 
nl J 


gi|20825377|refIXP_143696.1| similar to 
liypothetical protein dJ12208.2 [Homo 
sapiens] [Mus musculus] 


HG1000132N0 160000_genejpiedictio i 
nl , 


5il6754208|reflNP_034569.1| high mobility 
TOup box 1 ; hisii mobility group protein 1 
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[Mus musculus] 


HG1000133NO 160000_genejpredictio 


gi|26347765|dbj|BAC37531.1| unnamed protein 
product [Mus musculus] ' 


HG1000134NO 20000jgene_prediction 
1 


gi|26382599|dbj|BAB22733,2| unnamed protein 
product [Mus musculus] 


. HG1000134NO 20000_gene_prediction 
2 


gi|26353738|dbj|BAC40499.1| unnamed protein 
product [Mus musculus] 


HG1000142N0 160000_geiiejpredictio 
nl 


gi|26353738|dbj|BAC40499.1| unnamed protein 
product [Mus musculus] 


HG1000144N0 20000 ?ene nredirfinn 
1 


gi)OD /yi uo|Tei|jNr__u JZ /4o. 1 1 nucleopliosiiun 1 ; 
nucleolar protein N038 [Mus musculus] 


HG1000145NO 160000 amp nnv1tnf{n 

nl 


gi|6677779|ref|NP^033107.1I ribosomal protein 
JLzo, JJJN A segment, Cnr 7, Wayne State 
University 21, expressed [Mus musculus] 


HG1000146N0 160000_5aQe_predictio 
nl 


gi|oo / / / /y|rei|NF_033 1 07. 1 1 nbosomal protein 
L28; DNA segment, Chr 7, Wayne State 
University 2 1 , expressed [Mus musculus] 


HOI 0001 SntJO lOOnO <Tmt> -mv^Ai^^ry^ry 

1 


gLp / 1 /y7ojemb|CAA73041 . 1 1 5S nbosomal 
protein [Mus musculus] 


HG1000152NO 160000_gene_predictio 
nl 


gi|11037798|refiNP_067621.1| dynactin 5; 
dynacdn 4; p25 dynactin subunit [Mus 
musculus] 


HG1000161N0 160000_genejpredictio 
nl 


gi|21536242|ref|NP_573499.1| glucocorticoid 
induced transcript 1 ; testhymin; 
thymocyte/spermatocyte selection 1 [Mus 
musculus] 


HG1000163N0 160000_genejpredictio 
nl 


gi|20819730|ref|XP_129359.1| hypothetical 
protein XP 129359 [Mus musculus] 


HG 1 000 1 64N0_5000_genejpredictionl 


gi|20835770|ref|XPJ32127.1| similar to 60S 
illBOSOMAL PROTEIN LI 3 [Mus musculus] 


HG1000165N0_1000_gene_predictionl 


gi|26340448|dbj|BAC33887.11 unnamed protein . 
Droduct [Mus musculus] 


HG1000166NO 160000_gene_predictio 
n2 


gi|26353666|dbj|BAC40463.1| unnamed protein 
}ioduct [Mus musculus] 


HG1000167N0 160000_genejpredictio 
nl 


gil27369878|rei|NP_766203.1| hypothetical 
protein 5330403K09 [Mus musculus] 


HG1000171N0_40000_gene_prediction 


gi|26354683|dbj|BAC40968.1| unnamed protein 
product [Mus musculus] 


HG1000171N0 160000_gene_predictio 
nl 


gi|26325838|dbj|BAC26673.1| unnamed protein 
)n)duct [Mus musculus] 


HG1000175N0 160000_gene_predictio 
n2 


gi|26325838|dbj|BAC26673.1| unnamed protein 
5roduct [Mus musculus] 


•1 

HG1000176N0_1000_genejpredictionl 


gi|26354216|dbj|BAC40736.1| unnamed protein 
)roduct [Mus musculus] 
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HG1000176NO 160000_^ene_predictio 


gi|26337635|dbj|BAC32503,l| unnamed protein 
product [Mus musculus] 


HG1000177N0 160000_j;eiiej)iedictio 
nl 


eil26337635ldbilBAC32503 1 1 immmeH nmtpin 
product [Mus musculus] 


HG100O178N0 I60000_jenejpredictio 
nl 


fidl20884040lieflXP 134731 llenHnihHifll 

diifeientiatioii, sphingolipid G-protein-coiq)led 
receptor, 5 [Mus musculus] 


HG1000178N0 160000_^ene_predictio 
n2 


gi|13384830|refjNP_079706.1| RIKENcDNA 
1 1 10066C01 [Mus musculus] 


HG1000i80N0_1000^ene_predictionl 


gi|13384830|ref|NP_079706.1| RKEN cDNA 
1 1 10066C01 [Mus musculus] 


HG1000181N0 10000_geiie_ptedictioii 
1 


ffill 3^R47'^0!T<»flNP 070/^40 1 1 PrK"I7M /^nXTA 
1 1 10005A23 [Mus musculus] 


HG1000181NO 160000_gaie_predictid 
nl 


hypothetical protein FLI38281 [Homo sapiens] 
[Mils musculus] 


HG1000183N0 160000_gene_predictio 
nl 


gi|26334755|dbj|BAC31078.1| unnamed protein 
product [Mus musculus] 


HG1000186NO 20000_gene_prediction 
1 


gi|27370150|ref|NP_766364.1| hypothetical 
protein D630002G06 [Mus musculus] 


HG1000186NO 160000_gene_predictio 
n2 




HG1000187NO 20000^enej»rediction 
1 


gi|26342222|dbjlBAC34773.1| unnamed protein 
product [Mus musculus] 


HG1000187NO 160000_jgenejpredictio 




HG1000189N0_1000_gene_predictionl 


gi|25024769|iefIXP_207136.1| similar to ORF2 
Mus musculus domesticus] 


HG1000i89N0_5000_genejpredictionl 


/ j*t\%Mjj |x>/A.v^ZrUUi&i . X 1 unnameo. proxem 
product [Mus musculus] 


HG1000189N0_1000^ene_prediction2 


BG:DS01759.1 gene product [Drosophila 
melanogaster] [Mus musculus] 


HG1000189N0_5000_gene_prediction2 


eil263257'?4ldhilRAP2fi621 11 mmmpA riTntmri 

product [Mxis musculus] 


HG1000195N0_10000jgene_prediction 


eil208799Q2lreflXP 140210 11 ^similarto 
BG:DS01759.1 gene product [Drosophila 
melanogaster] [Mus musculus] 


HG1000199N0 160000_genejpredictio 
nl 


gi|17390530|gb|AAH18231.1| Unknown 
(protein for MGC:19236) [Mus musculus] 


HG1000201N0_10000_jeae_prediction 


gi|20824845|reflXP_131963.1| expressed 
sequence C77020 [Mus musculus] 


HG1000203NO_5000_gene_predicticMil 


Bil27477269|ref|XP„209223.1| similar to 
rransforming protein RhoC (H9) rHomo 
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sapiens] 


HG1000204N0 10000^ene_prediction 
1 


gi|26333233|dbj|BAC30334.1| unnamed protein 
product [Mus musculns] 


HG1000209NO 160000_£enejpredictio 

xa 


gi|26326739|dbj|BAC271 13.11 unnamed protein 
product [Mus musculus] 


HG1000215N0_5000jgene_piedictionl 


gi|27369784|ref|NP_766142.1| hypothetical 
protein A2300S3P19 [Mus musculus] 


HGl 00021 5N0_1000_gene_predictionl 


gi|Oo/i /Doirei|JNl:'_U3 1732.1 1 suppressor of 
cytokine signaling 2; cytokine inducible SH2- 
containing motein 2* hi^ ffrowth* STAT- 
induced STAT inhibitor 2; cytokine-inducible 
SH2 protein 2 [Mus musculus] 


HG1000219N0 IOOOOjgene_prediction 
1 


eil26328915ldbilBAC7f?1 06 1 1 nnniim^yl nmf^in 

product [Mus musculus] 


HG1000221N0 160000_genejpredictio 


eil45042551reflNP 002007 11 N9A Viictrmf* 
family, member Z; H2AZ histone [Homo 
sapiens] 


HG1000221NO 20000^eiiejjtediction 
1 


gi|l 1360345|pir||T42725 actin binding protein 
ACF7, neural isoform 1 - mouse (fiagment) 


HG1000223N0 160000^ene_predictio 
nl 


dl 1 1 3 60345 Inirl IT4979 S a rfin Ki'nHi'ti o r^Tv^f^tn 
ACF7, neural isoform 1 - mouse (fragment) 


HG1000225N0 160000_gene_predictio 
nl 


gi|25019988|reflXP_207469.1| similar to 
Retrovirus-related POL polyprotein [Mus 
musculusi 


HG1000235N0 160000 _£ene_piedictio 
nl 


gi|20137004|reflNP_03532ai| proteasome 
Q>rosome, macrbpain) 28 subunit, beta; 
protease (prosome, macropain) 28 subunit, beta 
Mus musculus] 


HG1000236NO 160000_gene jpiedictio 
nl 


gi|15617197|refiNP_077135:l| ATPase, H+ 
transDortin&^ Ivsosnmal 1 ^vr> vi GiiKunif n 

isoform 1; ATPase, H-h transporting, lysosomal 
(vacuolar proton pump) [Mus musculus] 


HG1000238NO 160000Jgene_predictio 
nl 


gi|6671704|refiNP_031664.1| chaperonin 
subunit 7 (eta) [Mus musculus] 


HG1000238NO_5000 ^ne_predictionl 


gi|6671549|ref|NP_^031479.1| anti-oxidant 
protein 2; acidic calcium-independent 
jhospholipase A2; peroxiredoxin 5; 1-Cys Prx 
Mus musculus] 


HG1000239NO_16000D jgene_pre<iictio 
nl . 


gi|6671549|ref|NP_031479.1| anti-oxidant 
protein 2; acidic calcium-independent 
jhospholipase A2; peroxiredoxin 5; 1-Cys Prx 
Mus musculus] 


HG1000241N0_160000_gene_predictio 
nl J 


gi|7657357|reflNP_056596.1| nucleosome 
assembly protein 1-like 1; nucleosome 
assembly protein-1 [Mus musculus] 
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HG1000243N0 160000jgenej)redictio 
nl 


gi|4759158|ref]NP_004588.11 small nuclear 
nix>nucieoproiei]i uz poi3rpcpa(ie io.DiCUa, 
small nuclear ribonucleoprotein D2 polypeptide 
(16,5kD) [Homo sapiens] 


HG1000243N0 160000_^ene_predictio 
n2 


gi|8393534|reflNP_058653.11 high mobility 
group protein 17 [Mus musculus] 


HGl 000245N0_1000_gene_predictionl 


gi|8393534|reflNP_058653.11 higji mobility 
STOUD nrotein 17 PMus musculus! 


HG1000250N0 160000^ene_predictio 
nl 


gi|12850132|dbj|BAB28604.1| unnamed protein 

product [Mus musculus] 


HG1000252N0_160000_genejpredictio 
nl 


gi|20824845tiefpCPJ31963.1| expressed 
sequence C77020 [Mus musculus] 


HG1000255N0 10000jgaie_prediction 
1 


gill71053941refiNP_000975.2| ribosomal 
protein L23a; 60S ribosomial protein L23a; 
melanoma diSerentiation-associated gene 20 
ITToTno sianienO 


HG1000262N0 160000_gene_piedictio 
n2 


gi|13385532|reflNP_080303.1| RKEN cDNA 
2700086123 FMus musculusl 


HG1000263NO 160000_genej)redictio 
nl 


gi|3599320|gb|AAC72793.1| 0RF2 [Mus 

musculus domesticusi 


HGl000264N0_5000_gene_predictioal 


gi|26360198|dbj|BAB25612.2| unnamed protein 

oroduct FMus Tnusciilii^l 


HGl 000264N0_5000_gene_prediction2 


gi|21624617|ref|NP_081018.1| RKEN cDNA 
1 1 10007M04 FMus Tnu.<u:u1m1 


HG1000265NO 160000_gene_predictio 
nl 


gi|21624617|ref|NP_081018.1| RIKEN cDNA 
1 1 10007M04 FMus musculusl 


HGl 000266N0_0_jgene_predictionl 


gi|25070241|reilXP_192786.1| proline rich 

nrotein exnuefssed in V)rain PMii^ mnQnilii*!! 

j-rA.vf«rN/xu V/AJ^JIV/^QN/.VI- JLU L/XCUXJ. |^XVAUO XXXUOVIXXUOJ 


HG100b266N0 160000jgenejpredictio 
nl 


gi|12584972|reflNP_075021.1| lipin 3 [Mus 
musculus] 


HGl 000267N0_5000_gene_predictionl 


gi|26340094|dbj|BAC33710,l| unnamed protein 

product [Mus musculus] 


HG1000270N0 16(K)00j6ene_predictio 
nl 


gi|6679937|ief|NP_0321 10.11 glyceraldehyde- 
3-phosDliate dehvdros^enase fMus musculusl 


HG1000271N0 lOOOOjgenejprediction 
1 


gi|12844196|dbj|BAB26273.1| unnamed protein 

product [Mus musculus] 


HG1000271N0 160000j6ene_predictio 
nl 


gi|26345908|dbj|BAC36605.1| unnamed protein 
product [Mus musculus] 


HG1000273N0 160000_genejpredictio 
nl 


gi|2634S908|dbj|BAC3660S.l| unnamed protein 
product [Mus musculus] 


HG1000295NO 160000_jene_predictio 
nl 


gil20888943|ief]XP_129258.1| cDNA sequence 
AF233884 [Mus musculus] 


HG1000296N0 160000_jgene_predictio 
nl 


gi|2l3132661reflNP_080089.1| RIKEN cDNA 
1200003006 [Mus musculus] 
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HG1000299NO 160000_jene_predictio 
nl 


gi|250547351refpa>_192839.1| ATPas. class H, 
type 9B |Mus musculus] 


HG1000300NO 10000_£«ie_prediction 
1 


gi|6753882|reflNP_034349.1| FK506 binding 
protein 4 (59 kDa) [Mus musculus] 


HG1000306N0_0_geiie_predictionl 


gi|25024769|ref|XP 207136.11 similar to ORF2 
(Mus musculus domesticus] 


HG1000306N0L0_fiene_prediction2 




HG1000312NO 160000 ome ntndirtin 

nl 




HG1000314N0_iq00_jgene_predictionl 


gi|4506283|i:eflNP_003454.11 protein tyrosine 
phb^hatase tj/pe IVA, member 1; Protein 
tyrosine phosphatase IVAl [Homo sapiens] 


nl 


gi|4506285|ref|NP_003470.1| protein tyrosine 
phosphatase type IVA, member 2, isoform 1; 
protein tyrosine phosphatase IVA; protein 
tyrosine phosphatase rVA2; phosphatase of 
regenerating liver 2 [Homo sapiens] 


HG1000330N0 160000 jgenejiredictio 
n2 


gi|oo7y553|rei|Nrj033003.1| protem tjrrosme 
phosphatase, non-receptor type 2 [Mus 
musculusl 


HG1000330NO 160000 geiie_predictio 
d4 


gi|12860388|dbj|BAB31939.1| unnamed protein 
Droduct FNius mii5u:iu1i]5£l 


HG1000332NO 10000_jenej»ediction 
1 


gi|26344091|dbj|BAC35702.1| unnamed protein 
product [Mus musculus] 


HG1000337N0_I000_jgene_piedictionl 


gi|20987322|gb|AAH3qi85.1| Unknown 
(protein for MGC:29401) [Mus musculus] 


HG1000341N0_5000_gene_predictionl 


gi|4506725|refINP_000998.1| ribosomal protein 
S4, X-linked X isofonn; 40S ribosomal protein 
S4, X isoform; ribosomal protein S4X isofonn; 
single-copy abundant mKNA; cell cycle gene 2 
[Homo sapiens] 


HG1000341N0 lOOOOjgenejrediction 
1 


gi|26332837|dbj|BAC30136,l| unnamed protein 
product [Mus musculus] 


HG10003S3NO 160000 ot-np nrpAictin 

nl 


/i^ /yoy|rer|JNr_4/3io4.i| Musasni 
homolog 2 (Drosophila) [Mus musculus] 


HG1000357N0 20000_gene_prediction 
1. 


j 5i|zouz 1 |rei| Ar_zu / y4 1 . 1 1 sunilar to 
Retrovirus-related POL polyprotein [Mus 
musculus] 


HG1000358NO_50pO_gene_predictionl 


gi|27372319|dbj|BAC53724.1| Piccolo [Mus 
musculus] 


HG1000359N0_1 60000_geaie_predictio 
nl 


gi|3599320|gb|AAC72793.1| 0RF2 [Mus 
musculus domesticus] 


HG1000363N0J60000_£eDejpredictio 
nl 


gi|3599320|gb|AAC72793.1| 0RF2 [Mus 
musculus domesticus] 
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HG1000364N0 160000_gene_predictio 
nl 


gi|19484126|gb|AAH25846.1| Unknown 
(iROtein for MGC:32383) [Mus musculus] 


HG1000367N0 160000_geDe_predictio 
nl 


gqi3928676|ref|NP_l 13687.1| proline rich 
protein 2 [Mus musculus] 


HG1000379N0 160000_£ene_predictio 
nl. 


gi|20863632|rcipCP_164160.1| hypothetical 
protein XP^164160 [Mus musculus] 


HG1000390NO lOOOOngenejnediction 
1 


gi|3599320|gb|AAC72793.1| 0RP2 [Mus 
musculus domesticusl 


HG1000390N0_5060_^e_piedictionl 


gi|20892585|refpa>_147977.1| RIKEN cDNA 
2610001E17 (Mus musculus] 


HG1000391NO 160000_jene_predictio 
nl 


gi|20892585|repi»J47977.1| RIKEN cDNA 
2610001E17 [Mus musculus] 


HG1000396N0 160000jgeaie_predictio 
n2 


gi|26330368|dbj|BAC28914,l| umiamed protein 
product [Mus musculus] 


HG1000401N0 lOOOO^enejpredictton 
1 




HG1000407NO 160000 eesne nrediVtin 
nl 


frill ^ft^'^/^O^MKIlT) A T>OnP1 n 1 1 »^-Tr.r^—. .. .1 J. • 

gi|izoDjoyD|aDj|i3Ai>zyoiy.i| uimameaprotem 
product [Mus musculus] 


HG10004O8N0 160000jg(aie_predictio 
d2 


gi|z;?uzy3ou|rei| Ar^zUooy 1 . 1 1 sumlar to 

PROBABLE POL POLYPROTEIN [Mus 
musculus] 


HG1000414N0 160000_geiie_predictio 
n2 


gi|26326871|dbj|BAC27179a| umiamed protein 
product [Mus musculus] 


HG1000416N0 160000 _genejpredictio 
nl 


gi|20902061[refpCP_147959.1| hypothetical 
protein XP_1 47959 [Mus musculus] 


HG1000428N0 160000_gene_predictio 
nl 


gi|25032567|ref|XP_207391.11 similar to ORF2 
'Mus musculus domesticus] 


HG100Q429NO 160000_£ene_predictio 
nl 


gi|25022040|ref|XP_:204233.1| similar to ORF2 
[Mus musciilus domesticus] 


HG1000431N0 20000_gene_prediction 
1 


gi|26339864|dbj|BAC33595.1| umiamed protein 
product [Mus musculus] 


HG1000435N0 160000_gene_predictio 
ol 


gi|8394057|ref|NPJ)58565.1| low density 
lipoprotein receptor-related protein 4; low 
uvuoiijf iipvipiviiciii--iciai6a proiem l^ow 
Density Lipoprotein Receptor Related Protein 
4; corin [Mus musculus] 


HG1000441N0 160000^ene_predictio 
nl 


gi!26340972|dbj|BAC34148.1| unnamed protein 
product [Mus musculus] 


HG1000441N0_160000_^ej)redictio 

n2 


gi|12836479jdbj|BAB23675.1| unnamed protein 
product [Mus musculus] 


HG1000446NO 160000_£ene_predictio 
nl 


gi|25029827|ref|XP_207226,l| simflarto ORF2 
Mus musculus domesticus] 


HG1000446N0 160000 gene_predictio i 
n2 |] 


gi|25031497|ref|XP_,207552.1| similarto 
Retrovirus-rolated POL Dolvprotein [Mus 



200 



wo 2004/020595 



PCTAJS2003/027107 



FPID 


Fantom Top Hit Annotation 




musculus] 


HG1000449NO_160000 _gene_predi«:tio 
ii2 


gi|3599320|gb|AAC72793.1| 0RF2 [Mus 

OlUSCIllUS aOIIL6SuCUdJ 


HG1000451N0J60000^enej)redictio 

nl 

11 ji 


gi|25054p21|refpap_19281 1.1| sinular to 
iiBnsincixiDrcuic pioi6ase, senns z 
(Epifheliasin) (Plasndc transmembrane protein 
ICS FMus musculusl 


HGl000455N0_10000_£ene_pi©diction 

1 
1 


gi|20846744|refpa»_144090.1| similar to 
lypothetical protein FLJ12457 [Mus musculus] 


HG1000461N0_10000^enej)rediction 

1 . 
1 


gi|20824899|iepP_144255.1| hypolJietical 
nmtein XP 1442SS rMus musculusl 




gi|12853695|dbj|BAB29819.1| unnamed protein 

nnoduct PMus musculusl 




gi|12834707|dbj|BAB23011.1| unnamed protein 
nroHuct fMufi iniiRCuliisI 


HG1000489K0_160000 jgaie_predictio 

Til 


TkCS T^loct" \\\\ 
IIU Uldov UlL 


HG1000499N0_160000_gene_j)redictio 
ni 


gi|35993201gb|AAC72793.1| 0RF2 [Mus 

tTm<if*iilitQ Hntnp^sticiisl 

JJJ.LioV'UlUO vLV/lXlwOUvUOJ 


HG100050aNOLl60000ijene_predictio 

ni 


gi|20912903|reflXP_126663.1| RKEN cDNA 
9A1 01 S4T1 6 rMn<; mii<;culusl 

Zi'T X vi JL •^•t J X vJ i^xvxvio I imovMjmoj 


HG1000505N0_160000_gene_predictio 

Til 


gi|2504495lMXP_195302.11 similar to 

nlfartnrv recentor MOR256-23 FMus musculusI 


HG1000509N0_10000^enejirediction 

1 
1 


gi|26334721|dbj|BAC31061.1| unnamed protein 

pXV/VXUVC (_lVXUO UAUOV/lUVtOJ 


HG10OO510N0 160000 ^ene_predictio 
nl. 


gill2834707|dbj|BAB2301 1.1| unnamed protein 
product [Mus musculus] 


HG1000513N0_160000^ene_predictio 
nl 


gill2859663|dbj|BAB31727.1| unnamed protein 
product [Mus musculus] 


HGl 0005 19N0_160000_gene_predictio 
nl 


gi|1191461sp|P200011BFll_CRIGR Elongation 
&ctor 1-alpha 1 (EF-l-alpha-1) (Elongation 
tactor 1 A-1) (ebr lA-lj (^blongation lactor luj 
(EF-Tu) 


HG1000521N0_160000_gene_predictio 
nl 


gi|Z4yjjUl|sp|yo3yj4|jJxoi3_MUU orain- 
specific homeobox/POU domain protein 3B 
(BRN-3B) (BRN-3.2) 


HG1000524NO loOOOO gene predicao 
hi 


gi|zi/csuiz^|ciDj|i5Ai5yo/ou,i| lype a^yi 
collagen [Mus musculus] 


HG1000530N0_20000jgenejprediction 

1 • ■ . 


ffil6679921 IreflNP 032102 11 gamma- 

^X|VF\/ / ✓✓A»X lldlXNx ^vrJArX V^*X| gCUXXXUw 

aminobutyric acid (GABA-A) receptor, subunit 
rho 2 [Mus musculus] 


HG1000530NO_1 60000_£ene_predictio 
n2 


gi|23622684|repPJ56394,2| e>5)ressed 
sequence AL023001 [Mus rdusculus] 
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HG1000534NO_1600pO_gene_predictio 
nl • 

111 




HG1000545N0j60000_jgCTiejredictio 
til ] 


gi|359932bigb|AAC72793.1| 0RF2 {Mas 
musculus domesticus] 


HG1000549N0_160000_^enejpreclictio j 
m 


gi|26341288|dbjlBAC34306.1| unnamed protein 
product [Mus musculus] 


HG1000549N0_160000 _gene_ptedictio 


gi|35993201gJ)|AAC72793.1| 0RF2 [Mus 
nusculus domesticus] 


HG1000549NO_160000 jgenejpiedictio 
n3 ; 


gi|21312126|ref|NP_081 135.11 RKEN cDNA 
1 10068E1 1 [Mus musculus] 


HG1000553N0 160000_gene__predictio 
nl 


^|3599320|gD|AAC fyo. 1 1 UKrZ [JVius 
musculus domesticus] 


HG1000560N0_160000jgene_predictio 


l^2503255D|rei|AJr_zu/4lz.l| suiuiar lO 
letrovirus-related POL polyprotein [Mus 

TTmoT'.nlii*;! 


HG1000562N0_160000jgeiiejiedictio 
nl 




HG1000566NO_40000 ^ne^piediction 
1 


gi|20856064|reflXP_151615.1| hypotiietical 
TiTntein "XP 151615 TMus musculusl 


HG1000566N0_16()0()0^enejiredictio 
nl 




HG1000582N0_160000_genejredictio 
nl 


gi|7656873|reflNP_056579.11 RIKEN cDNA 
S7'^0S8'?1C22 erne fMus musculusl 


HGlOOOSPSNO^ieOOOOjgenejpiedictio 
nl 


gi|4512261|dbjlBAA75227.1| neuiochondrin-2 

"Ml 1*1 Tnii^culufsl 


HG1000606N0_20000^enej)rediction 
I 


gi|19527.094|reflNP_598640.1| e3q)ressed 
semience AI327031 FMus musculusi 


HG1000607N0_160000_^enejpiedictio 

nl . 


gi|25058382|repP_206318.1| hypothetical 
■nrotein XP 206318 fMus musculusl 


HG1000608N0_20000_genejprediction 
1 


gi|3599320|gblAAC72793.1| ORF2 [Mus 
musculus domesticusl 




gi|26387941|dbj|BAC25633.1| umiamed protein 
Dioduct TMus musculusl 


HG1000622NO_160000 _gene_predictio 

til 




EG 1 000623N0_1 60000_£enejpredictio 
nl 


mionOOil 1 OOl-Mk-UVnO 1 ^^An^ 1 1 Vixjmnfliptinjll 

gl|ZU7U4 1 Zirjiei|-AJL ^ jOU J . 1 1 nypuuicut*u 
protein XP 155605 [Mus musculus] 


HG1000624N0_160000_£ene_predictio 
nl 


mil ^^/io/^nUfvUl A AWn^^^^ 11 niitntiv#* cViloride 
gj|l jDHZOy^lgD|/VArivjJjj.i| pUulUVO i^iuuiiviv/ 

channel (similar to Mm Clcn4-2) [Mus 
musculus] 


HG1000625N0_160000_jene_predictio 
nl 


gi|20901495|ref|XPj40099.1| RIKEN cDNA 
9 130404H23 [Mus musculus] 


HG1000628N0_40000jgenejpiediction 
1 


gi|3599320|gb|AAC72793,l| 0RF2 [Mus 
musculus domesticus] 
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HG1000628N0_20000_genejprediction 


5[l26339720|dbj|BAC33523.1| unnamed proteinl 
}Toduct [Mus muscolus] 


• • 

TTOI 000658N0 5000 eene Dredictionl i 


ri|35993201gb|AAC72793,l| 0RF2 [Mus 
nusculus domesticus] 


HG1000642N0_160000_£ene_predictio 
nl 




HG1000o4oN0_lo0u(K} jgeiiej>re<ucuo 
nl 


tnn^QQ'\0(\\MAACl')19'^ 11 ORF2 FMus 1 
musculus domesticxis] 


HG10P0649N0J60000jgenejpredictio 


rwiio^Aii07l 7l*«aflYP 1 A0/^4n 91 ^milar to &ene 1 
3bp73D protein - firuit fly 0>rosophila 
mdfmogaista') [Mus musculus] 


HG1000650N0_160000 jsbos _predictio 
HI 


gi|3599320|gb|AAC72793.1| 0RF2 [Mus 
musculus domesticus] | 


HG1000652N0_160000^enejredictio 
Hz 


gi|26377673|dbj|BAC25377.1| unnamed protein 
product [Mus musculus] 1 


HG1000656N0_160OO0 ^enejredictio 
nl 


gi|13384666|reflNP_079583.1| nuclear receptor 
jindinc fector 2 fMus musculus] 


HG1000656N0_16OOO0_gene jpredictio 
n2 


gi|25050704|repP_1334652| RIKEN cDNA 
2410004H02 FMus musculus] 1 


HG1000659N0_20000 jgenejrediction 
1 


gi|25050704lref|XPJ33465.2| RIKEN cDNA 
241 0004H02 [Mus musculus] 


HG1000661N0_20000jgene_prediction 
1 


gil26333733|dbj|BAC30584.1| unnamed proteinl 
product [Mus musculus] 


HG1000664N0_160000_.£enejpredictio 
nl 


gi|27372319|dbj|BAC53724.1| Piccolo [Mus 
musculus] j 


HG1000670NO_160000 _^ene_predictio 

ril 

ni 


gi|6680195|re£|NP_032255,l|histone 1 
cieacetyiase z^ j-^in a scguiciiiy \^ul i ty ajruw i 
State University 179, expressed [Mus 
musculus] 1 


HGl 000685N0_1 600p0_genq jpredictio 
nz 


gill7313266|refINP_478121.1| RecQ protein- 
like 4 [Mus musculus] 


HGl 000690N0_20000jBenejprediction 

1 
1 




HG1000690NO_20000 jgenejrediction 

9 ■ 


gi|26340662|dbj|BAC33993.1| unnamed proteinl 
product [Mus musculus] 


HGl 000696N0_20000jgenejpi:ediction 

1 
1 


gi|26340662|dbj|BAC33993.1| unnamed proteinl 
pixxhict [Mus musculus] 


HG10G0696N0jM)000^enejrediction 
1 

1 • 


gi|26326171|dbj|BAC26829.1| unnamed proteinl 
product [Mus musculus] 


HG1000697N0_160000 _genej)redictio 
nl 


gi|25024387|refpCP_207341.1| hypothetical 
protein XP 207341 [Mus musculus] 


HG1000700N0J60000 jgenejpredictio 
n2 


gi|26351279|dbj|BAC39276.1| unnamed protein' 
product [Mus musculus] 


HG1000704N0 160000 eene predictio 


gi|2l644579treflNP 660253,11 WilUams- . 
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nl 

1 


Beuren syndrome critical legion gene 17 [Mus 
nusculus] 


HGl 00071 lN0_20000jgeneJ)iediction 


gi|23273683|gb|AAH37239.1| Similar to 
iG[^-associated athanogene 4 [Mus musculus] 


HG100d738NO_160000 ^enejredictio 

nl 
111 


gi|128S6848|dbj|BAB30802.1| unnamed protein 
product [Mus musculus] 


HG1000739NO_160000jgene j>iedictio 


gi|26339470|dbj|BAC33406.1| umiamed protein 
product [Mus musculus] 


HG1000739N0_160000^enej)redictio 


gi|3599320|gblAAC72793.1| 0RF2 [Mus 
musculus domesticusi 


HG10p0740N0_10000^enejprediction 


gi|3599320|gb|AAC72793.1| 0RF2 [Mus 

mu^iilus domesticusi 


HG1000743N0_160000^nejredictio 
HI 


gil23601536|repP_130965.2| Nice-4 protein 

inmolocr FMus musculus! 


HG1000779NOJ60000 ^enejpredictio 
nl 


gi|2627027|dbj|BAA23475.1| F^l {Mus 

miicmiliiQl 


HG1000781N0_160000jgene_predictio 
nl 


gi|25023334|ref|XP_204722.1| similar to 
brmin [Mus musculus] 


HG1000781N0_160000_gaie_predic1io 
n2 


gi|2o350877|aoj|BAC3y075.l| umiameaproiem 
pioduct [Mus musculus] 


HG1000786NO_160000 ^enejpredictio 
nl 


gi|25023581|ret|Xr__2O71U3.1| sumiarto 
Rjkrovirus-related POL polyprotein [Mus 
musculus] 


HG1000788NO_1000 _jene_predictionl 


gi|2o34u832|abj|BAC340 /o, i| imnamca proxem 
product [Mus musculus] 


HGI000799N0 20000_gene_prediction 
1 


gi|20847912|reilXr__144olU.l| sumiarto 
KIAA1904 protein [Homo sapiens] [Mus 
musculus] 


HG1000808N0 1 60000 jg«ie_predictio 
nl 


gi|2o345yoU|clDj |li AC J ooj 1 . i | unnamea proieiii 
product [Mus musculus] 


HG1006817N0 160000 _gene_predictio 
nl 


gi|20882231|refpCPJ39203.1| similarto 
KlAAOojo protem [xiomo sapiensj lmus 
musculus] . 


HG1000822N0_20000jgene_predictiori 
1 


gi|13242237|refiNP_077327,l| Heat shock 
cognate protem 70; heat shock 70kD protem 8 
[Rattus norvegicus] 


HGlb00824N0 160000 .^e_piedidtio 
nl 


gi|6680195|refINP_032255.1| histone . 
Qcacetyiase z, unA scguicui, \^nr lu, wajriic 
State University 179, e^qjressed [Mus 
musculus] 


HG1000824N0 10000 jgene_piediction 
1 


gil20883564|repP_152815.1| hypothetical 
protein XP 152815 [Mus musculus] 


HG1000839N0 160000 _gene_predictio 
nl 


gi|20883564|TefpCP_152815.1| hypothetical 
protem XP 152815 [Mus musculus] 
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HG1000842N0 1 60000 jgMie_predictio 
nl 


gi|26339496|dbj|BAC33419.1| unnamed protein 
iroduct [Mus musculus] 


HG1000842N0 160000 _jene_predictio 
n2 


gi|3599320|gbIAAC72793.1 1 0RF2 [Mus 
musculus domesticus] 


HG1000869N0_160000_^ene_predictio 
nl 


gi|6715564|reflNP_032607.1| melanoma 
antigen, 80 kDa [Mus musculus] 


HG1000870NO_160000 _jene_predictio 
nl 


gii20881 174|refpa»_147875.1| hypoliietical 
irotein XP_147875 [Mus musculus] 


HG1000870N0_160000.jene_predictio 
n2 


giI27369942|rcf|NP_766246.1| hypoliietical 
jTotein 9530051F04 [Mus musculus] 


HG1000878N0_20000_jgene_prediction 
1 


gi|27369942|ref|NP_766246.1 1 hypothetical 
protein 9530051F04 [Mus musculus] 


HG100D878N0_20000_^ene_predicti6n 
2 


gi|27369942|refjNP_766246.1| hypothetical 
protein 9S30051F04 [Mus musculus] 


HG1000904N0_160000_gene_predictio 
d2 


gi|27369942|refINP_766246.1| hypothetical 
protein 9530051F04 [Mus musculus] 


HG1000904N0j«)000_^ene_prediction 
1 


gi|3599320|^|AAC72793.1| 0RF2 [Mus 
musculus domesticus] 


HG1000906N0 5000 eene medictionl 


gi|3599320|gblAAC72793.1| 0RF2 [Mus 

musculus domesticus] 


HG1000906N0_160000_geiie_predictio 
n2 


gi|20836822jrefpa>J30277.1| siaularto 
Plakophilin 4 (p0071) [Mus musculus] 


HG100091 ONO^ieOOOO^ene jpredictio 
nl 


gil35993201gb|AAC72793,l| ORF2 [Mus 
musculus domesticus] 


HG1000948N0_160000^eae_predictio 

nl . . 


gi|26325846|dbj|BAC26677.1| unnamed protein 
product [Mus musculus] 


HG1000955N0 1 60000 jgene_predictio 
nl 


gi|35993201gb|AAC72793.1| 0RF2 [Mus 
musculus domesticus] 


HG1000959N0 160000^enej)redictio 
nl 


gi|7670427]dbj|BAA95065.1| unnamed protein 
product [Mus musculus] 


HG1000959N0_5000_^ene__predictionl 


gil22507385|reflNP_081019.1| RIKEN cDNA 
1 1 10014F12 [Mus musculus] 


jtiwiuuuyyujNU Duuu_jgencj)reuicuuiii 


gi|22507385|iefiNP_081019.11 RKEN cDNA 


HG1000994N0 10000_^e_prediction 
1 


gi|10946762|refINP_067382.1| triggering 

triggering receptor expressed on monocytes 3 
[Mus musculus] 


HG1000994N0 160000_gene_predictio 
n2 


gi|12855175jdbj|BAB30238.11 unnamed protein 
product [Mus musculus] 


HG1000994N0 10000_£ene_ptecliction 
2 


gi|12855175|dbj|BAB30238.1| unnamed protein 
. product [Mus musculus] 


HGIOOIOOINO 160000jgene_predictio 
nl 


gi|12855175ldbj|BAB30238.1| unnamed protein 
product [Mus musculus] 
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Pantom Top Hit Annotation 


HGIOOIOOINO 0 eene Dredictionl 


gil26337385|dbj|BAC32378.1| unnamed protein 
jroduct [Mils musculus] 


HG1001002N0J60000jgenej)redictio 

Til 


gi|27370034|refINP_766297.1| hypothetical 
jrotein A530025 J20 [Mus musculus] 




gi|20348159|reflXP_l 1 1588.1| similar to 
TRAV9D-3 FMos moscolusl 


HG1001007N0_160000 _jenej)redictio 


gi|27370034|ief|NP_766297.1| hypothetical 
jiotein A530025J20 [Mus musculus] 


HGIOOIOIINO 160000 _jene_predictio 
nl 


gi|130970001^|AAH03291.1| Similar to 
lypothetical protem FLJ10342 [Mus musculus] 


HGIOOIOI INO 160000^Mie_predictio 
n2 


p|2o33o525|(lb] |B AC3 1 y43. 1 1 unnamea protein 
product [Mus musculus] 


HG1001014N0_160000 _gene_predictio 
m 


j^|25()47957|reilXP_1305o2,2| similar to 
hypoflietical protein MGC14161 [Homo 


XJ/T.1 nni A1 /IXin ^nnn mat^o 'rrr*»/^4r»'rinTi1 

ilvjiUUiui4iNU Duuu^^ene preaicuoin 


gi|263373851dbj|BAC32378.1| unnamed protein 


HG1001017N0__160000^ene_predictio 

Til 


gi|26337385|dbj|BAC32378.1| unnamed protein 

nmHiirt rMn^i Tnii*sCiilii5>l 


HG1001020NO_160000 _jene_predictio 
HI 


gil25019831|TeflXP_207463.1| similar to 

PDSOR rN/Tii^ mii^culusl 


HG1001024NO_160000 ^enejpredictio 
nl 


gil26338976|dbj|BAC33159.1| unnamed protem 

TnYvfii/*f rX4^iiQ mnQCiiliisI 


HG1001024N0_160000_gene_predictio 

HZ 


gi|20915148|iepP_149841.1| hypothetical 

rimfpi-n *XP 14Q$^41 rN/Tus miisciiliisl 

LyxWVWULl. <f\ ' JL^^U^A IXTXIXO XXXvm9\/VUUOJ 


HGl 00103 1N0_1 eOOOO^enejpredictio 

111 


gi|20915148|refp(P_149841.1| hypothetical 
nmteiti XP 14Q841 fMus musculusl 


xlvjlUUlUj jiNU D\j\jyj^_gpiiG prcuAOULuui 


gi|25071690MXP_193591.1| hypothetical 

nmteiTi ICP 1 0*^501 FMus musculusl 


HG1001043NO 160000_£eiie_predictio 
nl 


gi|263472491dbj|BAC37273.1| unnamed protein 
pioduct [Mus musculus] 


HG1001046N0 5000jgene_predictionl 


gl|oo7o7 1 4|rei|Nr JDizj J / • 1 1 lympnoia- 
restricted membrane protein [Mus musculus] 


. HG1001046N0_160000_gene_predictio 

nl 
ill 


gi|z504o9oy|reipU:'_14JoUi.3| Similar to 
bA401.1 (novel protein) [Homo sapiens] [Mus 
musculusl 


HG1001047N0 1000_^enej)redictionl 


gi|25021 180MXP^207917.1| snnilarto RNP 
particle component [Mus musculus] 


HG1001048N0 160000 jgene_predictio 
nl 


g^ZoDDD /Z4|ClDj|oAi^4U4yz.i| unnamea proiem 
product [Mus musculus] 


HG1001048N0 160000 _gene_piedictio 
n2 


gi|20343845|reflXPJ09652.1| similar to 
hypothetical protein FLJ25217 [Homo sapiens] 
[Mus musculus] 


HG1001144NO 20000 gene prediction 


fii|20346197|ref|XP 1 10161.11 RAN binding 
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protein 1 [Mus musculus] 


HGlOOl 148N0_160000_gene_predictio 
n2 


gi|3599320Igb|AAC72793.1| 0RF2 [Mus 
musculus domesticus] 


HG1001172N0_1 60000 .^ejpredictio 
nl 


gi|263396281dbj|BAC33485,l| unnamed protein 
product [Mus musculus] 


HGlOOl 1 72N0_20000_gene_precliction 


gi|22122489|ref|NP_666128.1| hypothetical 
protein MGC38936 [Mus musculus] 


HGlOOl Ie7N0 160000 bshq preoicuo 
nl 


JllZO'fU /UO|CLDJ|£>/\^JHUi^«i| lITlTiaillCU pxUlClU 

product [Mus musculus] 


HGlOOl 192N0 J 60000 _gene_predictio 

nl 

HI 


mil Q>iO'70OniivftfrMP nfiAn^^ 1 1 nrr^tAin IritiAQP 
1 o*ty /Z5/U|IcI|iN Jt^Uo*l'U ju» 1 1 proieiU jsJLUctoc 

rafl; murine sarcoma 3611 oncogene 1; 
sarxx)ma 3611 oncogene [Mus musculus] 


HG1001194N0_160000 ^eriejpredictio 

Til 


gil3599320|gb|AAC72793.1| 0RF2 [Mus 
musculus domesticus] 


HGlOOl 199N0_16OO00 ^ene^predictio 

Til 
111 


gi|20837732|reflXPJ3224Ll| hypothetical 
protein XP 132241 [Mus musculus] 


HGlOOl 199NO_160000_gene j)redictio 

iiZ 


gi|20071068|gb|AAH27341.1| Similar to 
elongation factor G2 fMus musculusi 


HG1001220N0_160000_gene_predictio 

111 


gi|20071068|gb|AAH27341.1| Similar to 
elonsation fector G2 fMus musculusI 


HG1001223N0_160000^ene_predictio 
m 


gil20908735|refpCP_122598,l| similar to helix- 
HftjrfflhiliT^inp^ nmtein - rat FMus musculusi 


HG1001229NO_160000 ^ene^predictio 


gi|25024769|refpa»_207136.1| similar to OBF2 
FMus mnscQlus domesticQSl 


HOI nOl^'^ONO SOOO pene nredictionl 


gi|6754206|reflNP_034568.1|hexokiiiase 1; 
downestst anemia fMus musculusi 


HG1001235N0_160000_^enejpredictio 

Ul 


gi|12857205|dbjlBAB30930.1| unnamed protein 
product [Mus musculus] 


HG1001235N0_10000^enej)i:ediction 

1 
1 


gi|21703918|ref|NP_663438.1| hypothetical 
protein BC0241 18 [Mus musculus] 


HG1001235N0_20000_genejredictioii 
1 

L 


gi|263393381dbj|BAC33340.1| unnamed protein 
product [Mus musculus] 


HG1001235NO_160000 _gaie_predictio 

LLC/ 


gi|26339338|dbj|BAC33340.1| unnamed protein 
product [Mus musculus] 


HG1001235N0_160000^enej)redictip 
n3 


gi|26340904|dbj|BAC34114.1| unnamed protein 
product [Mus musculus] 


HG1001260N0_160000_^eiiejpiedictio 
nl 

L±L 


gi|26327795|dbj|BAC27638.1| unnamed protem 
product [Mus musculus] 


HG1001260N0jM)000jgene_piediction 
1 


gi|8922328|ref|NP_060517.1| hypothetical 
piotem FLJ10290 [Homo sapiens] 


HG1001264N0 160000 ^enejredictio 
nl 


gi|8922328MNP.060517.1| hypothetical 
protein FLJ10290 [Homo sq)iens] 


HG1001274NO 160000 gene Diedictio 


gi|26383198|dbi|BAC2SS20.1| unnamed protein 
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nl 


product [Mus musculus] 


HG1001284N0 160000_jgene_predictio 
nl 


gi|3599320|gb|AAC72793.1| 0RF2 [Mus 
musculus domesticus] 


HG1001284NO 160000 _geiiB_predictio 
ii2 


gi|26326843|dbj|BAC271$5,l| unnamed protdn 
pioduct (Mus musculus] 


HG1001292NO 160000 ?ene nreHictin 

nl 


gi|^Qozoo^o|aQj|x>/vv^z / lOD. i| unnamea proiein 
product [Mus musculus] 


HG1001302N0 160000_gene_predictio 
nl 


gi| 1 juy /^^z|goiAAxiu J4z 1 - 1 1 oumlar to 
ATPase, H+ transporting, lysosomal (vacuolar 
proton pump) 3 IkD [Mus musculus] 


nl 


gi| xzoDZo^ 1 |aoj|i5Ai3zy4oo. 1 1 unnamed protem 
product [Mus musculus] 


HG1001323N0 160000_^ene_predictio 
nl 


gi|Z DUji 141 |rei| Ar_iyi73y. 1 1 sunilar to 
betaine-homocysteine methyltransferase 
FRattus norvecicusl fMus musculusl 


HG1001328NO_5000 _gene_predictionl 


gi|26347687|dbj|BAC37492.1| unnamed protein 

DFoduct FMus musculus! 


HG1001328N0 40000jgene_prediction 
1 


gi|26352918|dbj|BAC40089.1| unnamed protein 
nroduct FKfus musculusi 


HG100133 lNO_0_gene_predictionl 


gi|3599320|gb|AAC72793.1| 0RF2 [Mus , 
musculus domesticus] 


HG1001335N0 160000_gene_predictio 
nl 


gi|20381292|gb|AAH27770.1| stromal ceU 
derived factor receptor 2 [Mus musculus] 


HG1001335N0 160000 .^e_predictio 
n2 


gi|2193870|dbj|BAA20419.1| reverse 
transcriptase [Mus musculus] 


HG1001348N0 160000_gene_predictio 
nl 


gi|2193870|dbj|BAA20419.1| reverse 
transcrintase FMus musculusi 


HG1001349N0 1 60000 jgenejpredictio 
nl 


gi|20846538|ref|XP_150033.1| hypothetical 
protein XP_1 50033 [Mus musculus] 


nvj 1 i/u 1 J jHiN \j 1 ouuvvijgpiie preuiciio 
nl 


gi| /3UDziD|i«i|rviir_Uoojyy.i| jonase suppressor 
of ras [Mus musculus] 


HG1001361NO 160000_genejpredictio 
nl 


gi|oo/ooyu|rer|r>ir_ujZDZD.i| LdM nomeotx)x 
protein 5; LIM homeo box protein 5 [Mus 
musculus] 


HG1001376N0 160000_gene_predictio 
nl 


gi|20345901|ref|XP_109824.1| hypodietical 
protein XP_109824 [Mus musculus] 


HG1001376N0_5000_gene_predictionl 


gi|27261816|refiNP^080861.1| RIKENcDNA 
C530005J20 [Mus musculus] 


HG1001376NO 20000_gene_prediction 
1 


gi|27261816|ref[NP_080861,l| RJDKEN cDNA 
C530005J2G [Mus musculus] 


HG1001376N0_5000 _gen6_piediction2 


gil27261816|ref|NP_08086Ll| RKENcDNA 
C530005J20 [Mus musculus] 


HG1001376N0_5000_gene_prediction3 


gi|27261816|reflNP_08086Ll| RIKEN cDNA 
C530005J20 [Mus musculus] 
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* 


HG10bl417N0 160000 _gene_predictio 
nl 


gi|27261816|refiNP_080861.11 RKEN cDNA 
CS30005J20 [Mus musculus] 


HG1001417N0 1000 fiene_predictioiil 


gi|26349767|dbj|BAC38523.1| unnamed protein 
product [Mus musculus] 


HG1001417N0 160000_gene_predictio 
n2 


gi|26349767|abj|BAC38523.1| unnamed protein 
product [Mus musculus] 


HG1001417N0 160000_geiie_predictio 
n3 


gi|26349767|dbj|BAC38S23.1| imnamed protein 
product [Mus musculus] 


HG1001436N0_5000jgene_piedicti<Mil 


gi|26349767|dbj|BAC38523.1| unnamed protein 
product [Mus musculus] 


HGld01436N0 20000_gene_precliction 
1 


gi|20987280|gb|AAH29643.11 Unknown 
(protein for MGC:25768) [Mus musculus] 


HG1001436N0 160000_^eaie_predictio 
nl 


gi|25051637|refpa>_194491.1| RKEN cDNA 
111 0053F02 [Mus muisculus] 


HG1001439N0J60000^ene_piedictio 
nl 


gi|25051637|refpa»_194491.1| WKm cDNA 
111 0053F02 FMus musculusi 


HG10O1484N0 160000_gene_predictio 
nl 


gi|6753290|ref|NP_033943.1| calsequestrin 1 
fNf US musculusi 


HG1001485NO_10000 _gene_prediction 
1 


gi|25029827|refiXP_207226.1| similar to ORF2 

r^us musculus domesticusl 


HG1001500NO 160000_£ene_predictio 
nl 


gi|3599320|gb|AAC72793.1| 0RF2 [Mus 
musculus domesticus] 


xICjIUOIdUUNU loUOUO gene preaicaO 


gi|oo79l0o|iei|KPj03274o.l|nucleopnp 1; 
nucleolar protein N038 [Mus musculus] 


nl 


gi|25029928|ref|XP_207257,l| similar to 
ivciiuvirus-'rciaica ryji^ pQiyproiein j^ivius 
musculus] 




gi|20340683|refiXP_110361.1| similar to 
phospholipase C beta 2 [Rattus norvegicus] 
[Mus musculus] 
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Examples 

[0423] The examples, which are intended to be purely exen^lary of the 
invention and should flierefore not be considered to limit the invention in any way, 
also describe and detail aspects and embodiments of the invention discussed above. 
Hie examples are not intended to represent that the experiments below are all or the 
only e:q)eriments performed. EfiFoits have been made to ensure accuracy with respect 
to numbers used (e.g., amoimts, temperature, etc.) but some experimental errors and 
deviations should be accomited for. Unless indicated otherwise, parts are parts by 
weight, molecular wei^t is weig}it average molecular weight, tenq>erature is ini 
degrees Centigrade, and pressure is at or near atmospheric. 

[0424] While the present invention has been described with reference to the 
specific embodiments thereof, it should be understood by those skilled in the art fliat 
various changes may be made and equivalents may be substituted wifliout departing 
fix>m the true spirit and scope of fhe invention. In addition, many modifications can 
be made to ads^t a particular situation, material, composition of matter, process, 
process stsp or steps, to the objective, spirit and scope of the present invention. All 
such modifications are intended to be within the scope of the claims appended hereto. 

[042S] Additional objecte and advantages of tte invention 
part in the description which follows, and in part will be obvious fi*om the description, 
or may be learned by practice of the invention. The objects and advantages of tiie 
invention will be realized and attained by means of the elements arid combinations 
particularly pointed out in the appended claims. Moreover, advantages described in 
the body of the specification, if not included in the claims, are not per se limitations to 
the claimed invention. 

[0426] It is to be und^ood that both the foregoing general description and 
the following detailed description are exemplary and e3q)lanatory only and are not 
restrictive of the invention, as claimed. Moreover, it must be understood that the 
invention is not limited to the particular embodiments described, as such may, of 
course, vary. Further, the terminology used to describe particular embodiments is not 
intended to be limiting, since the scope of the present invention will be limited only 
by its claims. 

[0427] With respect to ranges of values, the invention encompasses each 
intervening value between the upper and lower limits of the range to at least a tenth of 
the lower limit's unitj unless the context clearly indicates otherwise. Further, the 
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invention encompasses any Other Stated intervenin Moreover, ttie invention 

also encompasses ranges excluding either or both of the uppesr and lower limits of die 
range, unless specifically excluded fix>m the stated range. 

[0428] Unless defined otherwise, the meanings of all technical and scientific 
terms used herein are those commonly understood by one of ordinary skill in the art to 
which this invention belongs. One of ordinary skill in the art will also appreciate that 
any methods and materials similar or equivalent to those described herein can also be 
used to practice or test the invention. Further, all publications mentioned herein are 
incorporated by reference. 

[0429] It must be noted that, as used herein and in the' qipended claims, the 
singular forms "a," "or," and "the" include plittal referents unless the context clearly 
dictates otherwise. Thus, for example, reference to "a subject polypeptide" includes a 
plurality of such polypeptides and reference to "the agent" includes reference to one 
or more agents and equivalents thereof known to those skilled in the art, and so forth. 

[0430] Further, all numbers expressing quantities of ingredients, reaction 
conditions, % purity, polypeptide and polynucleotide lengths, and so forth, used in the 
specification and claims, are modified by the term "about," unless otherwise 
indicated. Accordingly, the numerical parameters set forth in the specification and 
claims are approximations that may vary depending vqpon the desired properties of the 
present inventioiL At the veiy least, and not as an att^pt to limit the application of 
file doctrine of equivalents to the scope of the claims, each numerical parameter 
should at least be construed in light of the number of reported significant digits, 
applying ordinary rounding techniques: Nonetheless, the numerical values set forth in 
the specific examples are reported as precisely as possible. Any numerical value, 
however, inherrafiy contains certain errors fit>m the standard deviation of its 
experimental measurement 

[043 1 ] The publications discussed herein are provided solely for fiieir 
disclosure prior to the fifing date of the present application. Nothing herein is to be 
construed as an admission that the present invention is not entitled to antedate such 
pubUcation by virtue of prior invention. Further, the dates of publication provided 
may be different firom the actual publication dates which may need to be 
independently confirmed. 
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Example 1 Expression in £!. co/i 

[0432] Sequences can be expressed in E. coll Any one or more of the 
sequences according to SEQ ID NOS.: 1 -209 and 419 - 627 can be e3q)ressed isx E, 
coli by subcloning the entire coding region, or a selected portion thereof, into a 
prokaiyotic expression vector. For example, liie expression vector pQE16 ftom the 
QIA expression prokaryotic protein expression system (Qiagen, Valencia, C A) can be 
used The features of this vector that make it useful for protein expression include an 
efficient promoter (phage T5) to drive transcription, egression control provided by 
the lac operator system, which can be induced by addition of DPTG (isopropyl-beta-D- 
tbiogalactopyianoside), and an mcoded 6XHis tag coding sequence. The latter is a 
stretch of six histidine amino acid residues which can bind very tightly to a nickel 
atom. This vector can be used to e^qpress a recombinant pn>tein with a 6XHis. ta^ 
fused to its caxbox;^ terminus, allowing rapid and ef&dent purification using Ni- 
coupled affinity columns. 

[0433] The entire or flie selected partial codmg region can be amplified by 
PGR, then Ugated into digested pQE16 vector. The ligation product can be 
transformed by electroporation into electrocoxnpetent £. coli cells (for example, strain 
Ml S[pREP4] fiom Qiagen), and the transformed cells may be plated on ampicillinr 
containing plates. Colonies may then be screened for tiie correct insert in the proper 
orientation using a PGR reaction employing a gene-spedfic primer and a vector- 
specific primer. Also, positive clones can be sequenced to ensiure correct orientation 
and sequence. To e3q)ress the proteins, a colony containing a correct recombinant 
clone can be inoculated into L-Broth containing 100 fi^ml of anopicillin, and 25 
}ig/ml of kanamycin, and the culture allowed to grow overnight at 37 degrees C. The. 
saturated culture may then be diluted 20-fold in the sanie mediimi and allowed to 
grow to an optical density of 0.5 at 600 nm. At this point, IPTG can be added to a 
final concentration of 1 mM to induce protem expression. Afiser growing the culture 
for an additional 5 hours, the cells may be harvested by centrifugation at 3000 times g 
for 15 minutes. 

[0434] The resultant pellet can be lysed with a mild, nonionic detergent in 20 
mM Tris HCl (pH 7.5) (B PER.TM. Reagent fiom Pierce, Rockford, IL), or by 
sonication until the tuibid cell suspension turns translucent The resulting lysate can 
be further purified using a nickel-containing column (Ni-NTA spin column fi'om 
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Qiagen) under non-denaturing conditions. Briefly, flie lysate will be adjusted to 300 
mM NaCl and 10 mM imidazole, then centrifoged at 700 times g through the nickel 
spin colunm to allow the His-tagged recombinant protein to bind to the column. The 
column will be washed twice with wash buffer (for example, 50 mM NaHj PO4, pH 
8.0; 300 mM NaCl; 20 mM imidazole) and eluted with el\ition buffo: (for exa^^)le, 50 
mM NaH2 P04, pH 8,0; 300 mM NaCl; 250 mM imidazole). All the above 
procedures will be perfonned at 4 degrees C. The presence of a.purified protein pf 
the predicted size can be confirmed with SDS-PAGE. 
Example 2 : Expressioii in Mammalian Cells 

[0435] The sequences encoding tiie proteins of Example 1 can be cloned into 
the pENTR vector (Invitrogen) by PCR and transferred to the mammalian e?q)ression 
vector pP£ST12.2 per manu&cturer*s instructions (Invitrogen). Introduction pf the 
recombinant construct into the host cell can be effected by transfection with Fugene 6 
(Roche) per manufacturer^ instructions. The hoist cells containing one of 
polynucleotides of the invention can be used in conventional manners to produce the 
gene product encoded by the isolated fiiagment (in the case of an ORF). A number of 
types of cells can act as suitable host cells for expression of the proteins. Mammalian 
host cells include, for example, monkey COS cells, Chinese Hamster Ovary (CHO) 
cells, human kidney 293 cells, hunuui epidermal A431 cells, human Colo205 cells, 
3T3 cells, CV-1 cells, other transformed primate cell lines, normal diploid cells, cell 
strains derived firom in vitro culture of primary tissue, primary e?q>lants, HeLa cells, 
mouse L cells, BHK, HL-60, U937, HaKor Jurkat cells. 

Example 3: Expression in Cell-Free Translation Systems 

[0436] Cell-free translation systems can also be en:q>loyed to produce 
proteins using RNAs derived from the DNA constructs of the present invention, 
.^propriate cloning and e^qiression vectors containing SP6 or T7 promoters for use 
with prokaiyotic and eukaryotic hosts have been described (Sambrook et aL, 1989). 
Hiese DNA constructs can be used to produce proteins in a rabbit reticulocyte lysate 
system or in a wheat germ extract system. 

[0437] Specific expression systems of interest include plant, bacterial, yeast, 
insect cell and mammalian cell derived expression systems. E^qiression systems in 
plants include those described in U.S. Patent No. 6,096,546 and U.S. Patent No. 
6,127,145. Egression systems in bacteria include those described by Chang et al., 
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1978, Goeddel et al., 1979, Goeddel et aL, 1980, EP 0 0J6,776, U.S. Patent No. 
4,551,433; DeBoer et al,, 1983, and Siebenlist et al., 1980. 

[0438] Mammalian expression is fai&sr accomplished as described in 
Dijkema et al. 1985, Gorman et al., 1982, Boshart et al,, 1985, and U.S. Patent No. 
4,399,216. Olher features of mammaUan e3q)ression are facilitated as described in 
Ham and Wallace, Meth. Enz., 1979, Barnes and Sato, 1980, U.S. Patent Nos. ' 
4,767,704, 4,657,866, 4,927,762, 4,560,655, WO 90/103430, WO 87/00195, and U.S. 
RE 30,985. 

Example 4: Expression of the Secreted Factors in Yeast 

[0439] Primers can be designed to amplify the secreted fectors using PGR 
and clonal into pENTR/D-TOPO vectors (Invitrogen, Carlsbad, CA). The secreted 
factors in pENTR/D-TOPO can be cloned into the yeast expression vector pYES- 
DEST52 by Gateway LR reaction (Invitrogen, Carlsbad, CA). The resulting yeast 
expression vectors can be transformed into INVScl strain from Invitrogen to express 
the secreted factors according to flie manufecturer's protocol (Invitrogen, Carlsbad 
CA). The ejcpressed secreted factors will have a 6XHis tag at the C-terminal. 
Expressed protein can be purified with ProBond™ resin (Invitrogen, Carlsbad, CA). 

[0440] Expression systems in yeast include those described in Hinnen et al., 
1978, Itb et al., 1983, Kurtz et al., 1986, Kunze et al., 1985, Gleeson et al., 1986, 
Roggenkamp et al., 1986, Das et al., 1984, De Louvencourt et al., 1983, Van den Berg 
et al., 1990, Kunze et al., 1985, Cregg et al. 1985, U.S. Patent No. 4,837,148, U.S. 
Patent No. 4,929,555, Beach andNtirse, 1981, Davidow et al., 1985, GaiUardin et al., 
1985, Ballance et al., 1983, Tilbum et al., 1983, Yelton et al., 1984, KeUy and Hynes, 
1985, EP 0 244,234, and WO 91/00357. 

Example 5: Expression of Secreted Factors in Baculovirus Expression 
System. 

[0441] The secreted factors in pENTR/D-TOPO can be cloned into 
Baculovirus expression vector pDESTlO by Gateway LR reaction (Invitrogen, 
Carlsbad, CA). The secreted Actors can be expressed by the Bac-to-Bac expression 
system fiom Invitrogen (Carlsbad CA), briefly described as follows; The e^^ression 
vectors containing the secreted factors are transformed into conqpetent DHlOBac™ E. 
coli strain and selected for transposition. The resulting E coli contain recombinant 
bacmid that contains the secreted &ctor. Higji molecular wei^t DNA can be isolated 
from the E. coli containing the recombmant bacmid and then transfected into insect 
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cells witti Cellfectin reagmt The e?q)iessed secreted fadors vdU have a 6XHis tag at . 
N-tenninal. E?g)iessed protein wiU be piirified by PioBond^iesinCl^ 
Carlsbad, CA). 

[0442] Expression of heterologous genes in insects can be accomplished as 
described in U.S. Patent No: 4,745,051; Do^Qer et al., 1087; Friesen et al, 1986; EP 
0 127,839, EPO 155,476, Vlake/a/,, 1988, Miller a/., 1988, Carbonell e/a/., 1988, 
Maeda er a/,, 1985, Lebacq-Verheyden et al., 1988, Smith a/., 1985, Miyajima et 
ai; and Martin et aL, 1988. Numerous baculoviral strains and variants and 
corresponding permissive insect host cells from hosts have been previously described 
(SeUow et al., 1986, Luckow et a/., 1988; Miller et al., 1986; Maeda et al., 1985). 

Example 6: Primer Design 

[0443] To design the forward primer for PGR amplification, the melting 
point of the first 20 to 24 bases of the primer can be calculated by counting total A 
and T residues, then multiplying by 2. To design the reverse primer for PGR 
amplification, fhe melting point of the first 20 to 24 bases of the reverse conq)lement, 
with the sequences written fixnn 5-prime to 3-prime can be calculated by counting the 
total G and G residues, then mult^lying by 4. Both start and stop codons can be 
present in the final anqplified clone. The length offhe primers is such to obtain 
melting tenq)eratures within 63 degrees G to 68 degrees G. Adding the bases "GAGG" 
to the forward primer renders it compatible for cloning the PGR product with the 
TOPO pEim/D (Invitrogen, GA). 

Example?: Reverse Transcriptase Reaction 

[0444] cDNA can be prepared by the following method. Between 200 ng 
and 1.0 ^gmBNA is added to 2 pi DMSO and the volume adjusted to 11 filwitfa 
DEPG-treated water. One pi Oligo dT is added to the tube, and the mixture is heated 
at 70^ G for 5 min., quickly chilled on ice for 2 min., and the mixture is collected at 
the bottom of the tube by brief centrifiigatioiL The following 1^ strand components . 
are then added to the mRNA mixture: 2 pi lOX Stcatascript (Stratagene, CA) 1^ strand 
buffer, 1 pi 0.1 M DTT, 1 pi 10 mM dmP mix (10 mM each of dG, dA, 
dCTP), lplRNAsemhibitor,3plStratascriptRT (SOU/ pi). The contents are gently 
mixed and the mixture collected by brief centrifiigation. The mixture is incubated in a 
42° G water baOi for 1 hour, placed in a 70* G water bath for 1 5 min. to stop the 
reaction, transferred to ice for 2 miiLj and centrifiiged briefly in a microfuge to collect 
the reaction product at the bottom of the reaction vessel Two pi RNAse H is then 
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added to the tube, the contents are mixed well, incubated at 37" C in a water bath for 
20 min., and centrifuged briefly in a microfuge to collect the reaction product at the 
bottom of the reaction vessel. The reaction mixture can proceed directly to PGR or be 
stored at ~20'*C. 

Examples: FuD Length PGR 

[0445] Full length PGR can be achieved by placing tibe products of the 
leactipn described in Example 7, with primers diluted to 5\M in water, into a reaction 
vessel and adding.a reaction mixture composed of Ix Taq buffer, 25 mM dNTP, 10 ng 
cDNApool, TaqPlus (Stratagene, CA) (5u/ul), PfuTurbo (Stratagene, GA) (2.5u/ul), 
water. The contents of the reaction vessel are thai mixed genfly by inversion 5-6 
times, placed into a reservoir viiere 2jd Fi/Ri primei^ are added, the plate sealed and 
placed in flie fliermocycler. The PGR reaction is comprised of the following ei^t 
steps. Step 1: 95^ G for 3 min. Step 2: 94** G for 45 sec. Step 3: 0.5** G/sec to 56-60' 
G. Step4: 56-60^ G for 50 sec. Step 5: 72** G for 5 min. Step6: Gotos1ep2, 
perform 35-40 cycles. Step 7: 72*» G for 20 min. Step 8: 4^ G. 

[0446] The products can then be separated on a standard 0.8 to 1 .0% agarose 
gel at 40 to 80 V, the bands of interest excised by cutting fix>m flie gel, and stored at- 
20^^ G until extractioBL The material in flie bands of intCTest can be purified with 
QIAquick 96 PGR Purification Kit (Qiagen, CA) according to the manufacturer 
instructions. Gloning can be performed with the topo Vector pENTR/D-TOPO . 
vector (Invitrogen, GA) according to the manufacturer's instmctions. 
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CLAIMS 

1 . A first nucleic acid molecule convulsing a polynucleotide sequence 
chosen from at least one polynucleotide sequence according to SEQ ID NOS,: 1-209; 
SEQ ID NOS.: 419-627, or a con^lement th^ieof, or fix)m at least one polynucleotide 
sequence that encodes SEQ ID NOS: 210-418. 

2. The nucleic acid molecule of claim 1 , wherein the nucleic add 
molecule is a DNA or a RNA molecule. 

3. An animal injected with the nucleic acid molecule of claim L 

4. A double-stranded isolated nucleic acid molecule comprising the first 
nucleic acid molecule of claim 1 and its complement. 

5. The nucleic add molecule of claim 4, wherein the first polynucleotide 
sequence encodes a polypeptide chosen fix)m a polypeptide comprising a signal 
peptide, a mature polypeptide that lacks a signal peptide, a signal peptide, a 
biologically active fragment of a polypeptide, a polypeptide lacking a signal peptide 
cleavage site, a polypeptide consisting essentially of a N-terminal fiagment that 
contains a Pfam domain, and a polypeptide consisting essentially of a Oterminal 
fi:agment that contains a Pfam domain. 

6. A second nucleic add molecule comprising a second polynucleotide 
sequence that is at least about 70%, or about 80%, or about 90%, or about 95% 
homologous to the first nucldc add molecule of claiin 1 . 

7. A second isolated nucleic iacid molecule comprising a second 
polynucleotide sequence that hybridizes to the first polynucleotide sequence of claim 
1 under higjii stringency conditions. 

8. The second isolated nucleic acid molecule of claim 6, wherein the 
second polynucleotide sequence is complementary to the first polynucleotide 
sequence. 

9. A vector comprising the nucleic acid molecule of claim 1 and a 
promoter that drives the expression of the nucleic acid molecule. 

10. The vector ofclaim 9, wherein ttie promoter is chosen fit>m one or 
more of a promoter that is naturally contiguous to the nucldc acid molecule, a 
promote that is not naturally contiguous to the nucleic add molecule, an inducible 
promoter, a conditionally active promoter, a constitutive promoter, and a tissue 
specific promoter. 
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11. A host ceU transformed, transfectedjtran^ 
nucleic acid molecule of claim 1 . 

12. The host cell of claim 1 1, wherein the cell is chosen fixjm one or more 
of a prokaryotic cell, a eucary otic cell, a human cell, a mammalian cell, an insect cell, 
a fish cell, a plant cell, and a fungal cell. 

13. A nucleic acid con^)osition comprising a pharmaceutically acceptable 
carrier or a buffer and one or more compositions chosen fiom the nucleic- acid 
molecule of claim 1 , flie nucleic acid molecule of claim 4, the vector of claim 9, and 
the host cell of claim 1 L 

14- One or niore polypeptide molecules conq>rising a polypeptide 
sequence chosen fcom at least one amino acid sequence according to SEQ ID NOS.: 
210-418,. 

15. An animal injected with flie polypeptide molecule of claim 14. 

16. The polypq)tideofclaim 14, wherein the polypq)tide has a functi^ 
chosen from an agonist, an antagonist, a ligand, and a receptor. 

17. The polypeptide of claim 14, wherein the polypeptide is diosen fix)m a 
polypeptide comprising a signal peptide, a mature polypeptide that lacks a signal 
peptide, a signal peptide, a biologically active fiagment of a polypeptide, a 
polypqptide lacking a signal peptide cleavage site, a biologically active fisgment 
consisting essentially of an N-terminal fi^agment containing a Pfam domain, and a C- 
tenninal fi:agment containing a Pfiim domain. 

. 18. A polypeptide compositicm comprising the polypeptide molecule of 
claim 14 and a pharmaceutically acceptable carrier or a buffer. 

19. A cell culture medium comprising the polypeptide of claim 14. 

20. The cell culture medium of claim 19, further comprising responder 
cells chosen firom one or more T cells, B cells, NK cells, dendritic cells, macrophages, 
muscle cells, stem cells, q>ithelial skin cells, fat cells, blood cells, brain cells, bone 
marrow cells, endothelial cells, retinal ceUs, bone cells, kidney cells, pancreatic cells, 
liver cells, spleen cells, prostate cells, cervical cells, ovarian cells, breast cells, lung 
cells, liver cells, soft tissue cells, colorectal cells, cells of the gastrointestinal tract, and 
cancer cells. 

21. The cell culture medium of claim 20, wherein the responder cells 
proUferate in the rnedium. 
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22. The ceUcidturc medium of claim 20, whei^in the re^ 

inhibited in the medium. 

23. A cell culture comprising transfected cells, wherein Ihe transfected 
. cells are transfected wifti the polynucleotide of claim 1 . 

24. The cell culture of claim 23, further comprising responder cells chosen 
fix)m one or more T cells, B cells, NK cells, dendritic cells, macrophages, muscle 
cells, stem cells, epithelial skin cells, fet cells, blood cells, brain cells, bone marrow 
cells, endotheUal cells, retinal cells, bone cells, kidney cells, pancreatic cells, Uver 
cells, spleen cells, prostate cells, cervical cells, ovarian cells, breast cells, lung cells, 
liver cells, soft tissue cells, colorectal cells, cells of the gastrointestinal tract, and 
cancer cells. 

25. The cell culture of claim 23, wherein the responds cells proliferate in 
the cell culture. 

26. The cell culture of claim 23, wherein the responder cells are inhibited 

in the cell culture. 

27. A method ofmaking a tramsfonne^ transfected, transduced, 

host cell comprising: 

(a) providing a composition comprising the vector of claim 9, and 

(b) allowing a host cell to come into contact with tiie vector to form a 
transformed, transfected, transduced, or infected host cell. 

28. A method ofmaking a polypeptide comprising: 

(a) providing a nucleic acid molecule that comprises a 
polynucleotide sequence encoding the polypeptide of claim 14; 

(b) introducing tiie nucleic acid molecule into an e;q)ressiQn 
system; and 

(c) allowing the polypeptide to be produced. 

29. A method ofmaking a polypeptide con^rising: 

(a) providing a composition conq)rising the host cell of claim 11; 

(b) culturing the host cell to produce the polypeptide; and 

(c) allowing the polypeptide to be produced. 

30. A diagnostic kit comprising a polynucleotide molecule, wherein tiie 
polynucleotide molecule comprises a sequence chosen from (a) at least 6, (b) at least 
7, (c) at least 8, and (d) at least 9 contiguous nucleotides chosen from the nucleic acid 
molecule of claim 1. 
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31. A diagnostic Idt comprising a pol>peptide molecule, where^ 
polypeptide molecule comprises an amino acid sequence or a biologically active 
fragment thereof, derived from the nucleic acid molecule of claim 1. 

32. A genetically modified mouse comprising a deletion, substitution, or 
modification of a sequence chosen fix>m SEQ ID NOS.: 1-209; SEQ ID NOS.: 419- 
627, wherein the deletion, substitution or modification prevents or reduces e3q)ression 
of said sequence and results in a mouse deficient in or completely lacking one or more 
gene products of a sequence chosen from SEQ ID NOS.: 1-209; SEQ ID NOS.: 419- 
627. 

33. A method of determining the presence of the nucleic acid molecule of 
claim 1 or its complement comprising: 

(a) providing a complement to the nucleic acid molecule or providing a 
complement to the complement of the nucleic acid molecule; 

(b) allowing the molecules to interact and 

(c) determining whetiier interaction has occurred. 

34. A method ofdet^mining the presence of an antibody to the 
polypeptide of claim 14 in a sanq>le, comprising: 

(a) providing the polypeptide; 

(b) allowing the polypeptide to interact with any specific antibody in the 
sample; and 

(c) determining whether interaction has occurred. 

35. A cell-free mediiun comprising the polypieptide of claim 14. 

36. The cell-free medium of claim 35, frirther comprising lysates chosen 
bom bacterial cells and eukaiyotic cells. 

37. The cell-fiiee medium of claim 36, wherein the eukaryotic cells are 
. wheat germ cells. 

38. A non-human animal comprising the polynucleotide of claim 1 , 
wherein the animal produces a human protein. 

39: A non-human eukaryoticceU comprising the polynucleotide of claim 
1 , wherein the cell produces a human protein. 

40. A bacterial cell comprising the polynucleotide of claim 1, v^erein tiie 
cell produces a human proteiiL 
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